Skip to content Skip to sidebar Skip to footer

How To Fetch Rows With Max Update Datetime Using Group By And Having With Sqlalchemy And Postgresql

I'm going from SQLite to Postgresql. This has made one of my queries not work. It's not clear to me why this query is allowed in SQLite, but not in Postgresql. The query in questio

Solution 1:

The query is allowed in SQLite since it allows SELECT list items to refer to ungrouped columns outside of aggregate functions, or without said columns being functionally dependent on the grouping expressions. The non-aggregate values are picked from an arbitrary row in the group.

In addition it is documented in a sidenote that special processing of "bare" columns in an aggregate query occurs, when the aggregate is min() or max() :

When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum.

This only applies to simple queries and there is again ambiguity, if more than 1 rows have the same min/max, or the query contains more than 1 call to min() / max().

This makes SQLite non-conforming in this respect, at least with the SQL:2003 standard (I'm fairly certain that this has not changed much in the newer versions):

7.12 <query specification>

Function

Specify a table derived from the result of a <table expression>.

Format

<queryspecification> ::=
    SELECT [ <setquantifier> ] <selectlist><tableexpression>

...

Conformance Rules

...

3) Without Feature T301, “Functional dependencies”, in conforming SQL language, if T is a grouped table, then in each <value expression> contained in the <select list>, each <column reference> that references a column of T shall reference a grouping column or be specified in an aggregated argument of a <set function specification>.

Most other SQL DBMS, such as Postgresql, follow the standard more closely in this respect, and require that the SELECT list of an aggregate query consist of only grouping expressions, aggregate expressions, or that any ungrouped columns are functionally dependent on the grouped columns.

In Postgresql a different approach is then required in order to fetch this kind of result. There are many great posts that cover this topic, but here's a summary of one Postgresql specific approach. Using the DISTINCT ON extension combined with ORDER BY you can achieve the same results:

@classmethoddeffind_recent_by_section_id_list(
        cls, section_id_list: List) -> List["SectionStatusModel"]:
    return (
        cls.query
        .filter(cls.section_id.in_(section_id_list))
        .distinct(cls.section_id)
        # Use _id as a tie breaker, in order to avoid non-determinism
        .order_by(cls.section_id, cls.update_datetime.desc(), cls._id)
    )

Naturally this will then break in SQLite, as it does not support DISTINCT ON. If you need a solution that works in both, use the row_number() window function approach.


Post a Comment for "How To Fetch Rows With Max Update Datetime Using Group By And Having With Sqlalchemy And Postgresql"