Say I have a query:
SELECT column_1, column_2 max(column_3) as max_column_3
FROM table_1
GROUP BY column_1, column_2
and I'm interested in returning the id
of each row, instead of just column_1, column_2 max(column_3). When I try:
SELECT id, column_1, column_2 max(column_3) as max_column_3
FROM table_1
GROUP BY column_1, column_2
It gives me an error, because the id
is not denoted in the GROUP BY clause. Note that (column_1, column_2, column_3)
is not sufficient to uniquely identify a particular row.
How do I go about getting the id
of each row returned row?
CodePudding user response:
You'll want to treat the result of your first query as a subquery, and join that back to your original data.
For example,
WITH SUBQUERY_MAX AS (
SELECT
column_1,
column_2,
max(column_3) as max_column_3
FROM
table_1
GROUP BY
column_1,
column_2
)
SELECT
ORIGINAL.id,
ORIGINAL.column_1,
ORIGINAL.column_2,
ORIGINAL.column_3
FROM
ORIGINAL
INNER JOIN SUBQUERY_MAX ON ORIGINAL.column_1 = SUBQUERY_MAX.column_1
AND ORIGINAL.column_2 = SUBQUERY_MAX.column_2
AND ORIGINAL.column_3 = SUBQUERY_MAX.max_column_3
By joining it back to the original data on the 3 columns, you can identify the row that was the max.
There is a pitfall with this method that happens when there is a tie for the maximum. The subquery would have 1 row but the original might have more than 1 row. This will give more than 1 row in your final result as well.
If you must pick 1 row, you can use a window function such as DENSE_RANK()
or ROW_NUMBER()
to create a rank column... and if postgres supports it you can filter that with a QUALIFY
statement.
For example,
SELECT *,
DENSE_RANK() OVER(
PARTITION BY column_1, column_2
ORDER BY column_3 DESC
) AS RANK_COLUMN_3
FROM table_1
QUALIFY RANK_COLUMN_3=1
I recommend learning the concept of using a subquery and joining the results to the original - this technique will come in handy for many different types of problems in the future.
CodePudding user response:
As far as i have understood your question, you want to return Id column in your query as well as other columns, for that you should add Id in group by clause as well, as Id is not part of any aggregate function, so it needs to be part of group by For example:
SELECT id, column_1, column_2, max(column_3) as max_column_3
FROM table_1
GROUP BY id, column_1, column_2
I hope this helps