Home > Enterprise >  Is the ordering of a GROUP BY with a MAX aggregate well defined?
Is the ordering of a GROUP BY with a MAX aggregate well defined?

Time:12-03

Let's assume I run the following in SQLite:

CREATE TABLE my_table
(
     id        INTEGER PRIMARY KEY,
     NAME      VARCHAR(20),
     date      DATE,
     num       INTEGER,
     important VARCHAR(20)
);

INSERT INTO my_table (NAME, date, num, important)
VALUES ('A', '2000-01-01', 10, 'Important 1');

INSERT INTO my_table (NAME, date, num, important)
VALUES ('A', '2000-02-01', 20, 'Important 2');

INSERT INTO my_table (NAME, date, num, important)
VALUES ('A', '1999-12-01', 30, 'Important 3');

The table looks like this:

id NAME date num important
1 A 2000-01-01 10 Important 1
2 A 2000-02-01 20 Important 2
3 A 1999-12-01 30 Important 3

If I execute:

SELECT id
FROM   my_table
GROUP  BY NAME;

the results are:

 ---- 
| id |
 ---- 
| 1  |
 ---- 

If I execute:

SELECT id, MAX(date)
FROM   my_table
GROUP  BY NAME;

The results are:

 ---- ------------ 
| id | max(date)  |
 ---- ------------ 
| 2  | 2000-02-01 |
 ---- ------------ 

And if I execute:

SELECT id,
       MAX(date),
       MAX(num)
FROM   my_table
GROUP  BY NAME;

The results are:

 ---- ------------ ---------- 
| id | max(date)  | max(num) |
 ---- ------------ ---------- 
| 3  | 2000-02-01 | 30       |
 ---- ------------ ---------- 

My question is, is this well defined? Specifically, am I guaranteed to always get id = 2 when doing the second query (with the single Max(date) aggregate), or is this just a side effect of how SQLite is likely ordering the table to grab the Max before grouping?

I ask this because I specifically do want id = 2. I will then execute another query that selects the important field for that row (for my actual problem the first query would return multiple ids and I'd select all important fields for all those rows at once.

Additionally, this is all happening in an iOS Core Data query, so I'm not able to do more complicated subqueries. If I knew that the ordering of a GROUP BY is defined by an aggregate then I'd feel pretty confident my queries wouldn't break (until Apple moves away from SQLite for Core Data).

Thanks!

CodePudding user response:

From the Sqlite manual

2.5. Bare columns in an aggregate query

The usual case is that all column names in an aggregate query are either arguments to aggregate functions or else appear in the GROUP BY clause. A result column which contains a column name that is not within an aggregate function and that does not appear in the GROUP BY clause (if one exists) is called a "bare" column. Example:

SELECT a, b, sum(c) FROM tab1 GROUP BY a;

In the query above, the "a" column is part of the GROUP BY clause and so each row of the output contains one of the distinct values for "a". The "c" column is contained within the sum() aggregate function and so that output column is the sum of all "c" values in rows that have the same value for "a". But what is the result of the bare column "b"? The answer is that the "b" result will be the value for "b" in one of the input rows that form the aggregate. The problem is that you usually do not know which input row is used to compute "b", and so in many cases the value for "b" is undefined.

Special processing occurs when the aggregate function is either min() or max(). Example:

SELECT a, b, max(c) FROM tab1 GROUP BY a;

When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum. So in the query above, the value of the "b" column in the output will be the value of the "b" column in the input row that has the largest "c" value. There is still an ambiguity if two or more of the input rows have the same minimum or maximum value or if the query contains more than one min() and/or max() aggregate function. Only the built-in min() and max() functions work this way.

If bare columns appear in an aggregate query that lacks a GROUP BY clause, and the number of input rows is zero, then the values of the bare columns are arbitrary. For example, in this query:

SELECT count(*), b FROM tab1;

If the tab1 table contains no rows (of count(*) evaluates to 0) then the bare column "b" will have an arbitrary and meaningless value.

Most other SQL database engines disallow bare columns. If you include a bare column in a query, other database engines will usually raise an error. The ability to include bare columns in a query is an SQLite-specific extension.

https://www.sqlite.org/lang_select.html

CodePudding user response:

am I guaranteed to always get id = 2 when doing the second query (with the single Max(date) aggregate), or is this just a side effect of how SQLite is likely ordering the table to grab the Max before grouping?

Yes, the result that you get is guaranteed because it is documented in Bare columns in an aggregate query.

The value for the column id that you get is from the row that contains the max date.

  • Related