Why the wrong query is wrong?
-- Correct
SELECT first_name, count(*)
FROM customer
GROUP BY first_name
ORDER BY count(*) DESC
-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name
source: https://blog.jooq.org/a-beginners-guide-to-the-true-order-of-sql-operations/
CodePudding user response:
Issue
Given:
-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name
What was the error responded by SQL parser?
Presumably similar to:
ERROR: column "customer.last_name" must appear in the GROUP BY clause or be used in an aggregate function Position: 20
(Run the SQL and see this error output from PostgreSQL in SQLfiddle demo)
So just make sure all columns from your SELECT list, that are not aggregate-functions like count
or sum
are present in the GROUP BY list, they form the groups for which is aggregated.
Working GROUP BY
-- Correct
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name, last_name
See it working in SQLfiddle demo resulting in counting duplicates for the names.
CodePudding user response:
I couldn't understand your explanation in comments, thus explaining here:
In first query, you are selecting columns First_Name and aggregating them into virtual buckets (group by First_Name). Second column is an aggregation function counting how many of them in given bucket. So for example:
First_Name, Last_Name
John, Doe
John, Carpenter
Frank, Sinatra
Frank, Doe
Frank, Short
You do a grouping by First_Name (and select it in the select list), you would have rows only:
First_Name
John
Frank
Adding an aggregate function (count in your case), it turns out (* in count means count rows with no particular column):
First_Name, Count(*)
John, 2
Frank, 3
Now if you consider second query:
-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name
For the result set:
First_Name, Last_Name, Count(*)
John, ????, 2
Frank, ????, 3
There isn't something that tells where the content for Last_Name come from (thus it would be a bug to include it).
If you wrote it as:
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name, last_name
It would be ok. Now your problem would be getting more than you need:
First_Name, Last_Name, Count(*)
John, Doe, 1
John, Carpenter, 1
Frank, Sinatra, 1
Frank, Doe, 1
Frank, Short, 1
That would at least reveal those having duplicated and could be useful in cases.