Why does this query need order by to work?-CodePudding

Why the wrong query is wrong?

-- Correct
SELECT first_name, count(*)
FROM customer
GROUP BY first_name
ORDER BY count(*) DESC
 
-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name

source: https://blog.jooq.org/a-beginners-guide-to-the-true-order-of-sql-operations/

CodePudding user response：

Issue

Given:

-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name

What was the error responded by SQL parser?

Presumably similar to:

ERROR: column "customer.last_name" must appear in the GROUP BY clause or be used in an aggregate function Position: 20

(Run the SQL and see this error output from PostgreSQL in SQLfiddle demo)

So just make sure all columns from your SELECT list, that are not aggregate-functions like count or sum are present in the GROUP BY list, they form the groups for which is aggregated.

Working GROUP BY

-- Correct
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name, last_name

See it working in SQLfiddle demo resulting in counting duplicates for the names.

CodePudding user response：

I couldn't understand your explanation in comments, thus explaining here:

In first query, you are selecting columns First_Name and aggregating them into virtual buckets (group by First_Name). Second column is an aggregation function counting how many of them in given bucket. So for example:

First_Name, Last_Name
John, Doe
John, Carpenter
Frank, Sinatra
Frank, Doe
Frank, Short

You do a grouping by First_Name (and select it in the select list), you would have rows only:

First_Name
John
Frank

Adding an aggregate function (count in your case), it turns out (* in count means count rows with no particular column):

First_Name, Count(*)
John, 2
Frank, 3

Now if you consider second query:

-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name

For the result set:

First_Name, Last_Name, Count(*)
John, ????, 2
Frank, ????, 3

There isn't something that tells where the content for Last_Name come from (thus it would be a bug to include it).

If you wrote it as:

SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name, last_name

It would be ok. Now your problem would be getting more than you need:

First_Name, Last_Name, Count(*)
John, Doe, 1
John, Carpenter, 1
Frank, Sinatra, 1
Frank, Doe, 1
Frank, Short, 1

That would at least reveal those having duplicated and could be useful in cases.