Obtaining original ID after aggregate operation (SQL)-CodePudding

Say I have a query:

SELECT column_1, column_2 max(column_3) as max_column_3
FROM table_1
GROUP BY column_1, column_2

and I'm interested in returning the id of each row, instead of just column_1, column_2 max(column_3). When I try:

SELECT id, column_1, column_2 max(column_3) as max_column_3
FROM table_1
GROUP BY column_1, column_2

It gives me an error, because the id is not denoted in the GROUP BY clause. Note that (column_1, column_2, column_3) is not sufficient to uniquely identify a particular row.

How do I go about getting the id of each row returned row?

CodePudding user response：

You'll want to treat the result of your first query as a subquery, and join that back to your original data.

For example,

WITH SUBQUERY_MAX AS (
  SELECT 
    column_1, 
    column_2, 
    max(column_3) as max_column_3 

  FROM 
    table_1 
  GROUP BY 
    column_1, 
    column_2
) 
SELECT 
  ORIGINAL.id, 
  ORIGINAL.column_1, 
  ORIGINAL.column_2, 
  ORIGINAL.column_3 
FROM 
  ORIGINAL 
  INNER JOIN SUBQUERY_MAX ON ORIGINAL.column_1 = SUBQUERY_MAX.column_1 
  AND ORIGINAL.column_2 = SUBQUERY_MAX.column_2 
  AND ORIGINAL.column_3 = SUBQUERY_MAX.max_column_3

By joining it back to the original data on the 3 columns, you can identify the row that was the max.

There is a pitfall with this method that happens when there is a tie for the maximum. The subquery would have 1 row but the original might have more than 1 row. This will give more than 1 row in your final result as well.

If you must pick 1 row, you can use a window function such as DENSE_RANK() or ROW_NUMBER() to create a rank column... and if postgres supports it you can filter that with a QUALIFY statement.

For example,

SELECT *,
  DENSE_RANK() OVER(
    PARTITION BY column_1, column_2 
    ORDER BY column_3 DESC 
  ) AS RANK_COLUMN_3
FROM table_1
QUALIFY RANK_COLUMN_3=1

I recommend learning the concept of using a subquery and joining the results to the original - this technique will come in handy for many different types of problems in the future.

CodePudding user response：

As far as i have understood your question, you want to return Id column in your query as well as other columns, for that you should add Id in group by clause as well, as Id is not part of any aggregate function, so it needs to be part of group by For example:

SELECT id, column_1, column_2, max(column_3) as max_column_3
FROM table_1
GROUP BY id, column_1, column_2

I hope this helps