For SQL, how to use group by twice but only order by sum() within the 2nd group-CodePudding

The table records credit card transactions where each row is one record.

The columns are: transaction_id, customerID, dollar_spent, product_category.

How can I pick up the 3 customerID's from each product_category that have the highest dollar_spent within that category?

I was thinking about something like:

select product_category, customerID, sum(dollar_spent) 
from transaction
group by product_category, customerID
order by sum(dollar_spent) desc limit 3

but it failed to pass. Removing "limit 3" helped it to pass but the whole result is sorted solely by sum(dollar_spent), not by sum(dollar_spent) within each product_category.

Searched on StackOverflow but didn't find anything relevant. Could someone help me with this? Many thanks!!

CodePudding user response：

What I think you are looking for is a windowed function.

I added the row_number function to your select ordered by the amount of money spent, this basically ranks customer spending by product.

Then since you can not use a windowed function in a where clause (unfortunately) i had to create an outer query to filter on the row_number results

select * from( 
   select product_category, customerID, sum(dollar_spent), 
   row_number() over (partition by product_category order by sum(dollar_spent) desc) as rowNum
   from transaction
   group by product_category, customerID
) as data
where rowNum <= 3
order by product_category, rowNum

here is a fiddle