<PostgreSQL> How can I get all the columns of my table based only on a field having the most c-CodePudding

For example I have a Product table with count, but I would like to display only the values of the top 3 products based on the sum of their count.

Product	Date	Value
Product 1	2022-12-01	200
Product 1	2022-12-02	200
Product 2	2022-12-01	200
Product 2	2022-12-03	500
Product 3	2022-12-04	300
Product 3	2022-12-08	600
Product 4	2022-12-01	100
Product 4	2022-12-03	100
Product 5	2022-12-01	700
Product 5	2022-12-10	800

Based on the sample above, the sum of each product would be: Product 1 - 400 Product 2 - 700 Product 3 - 900 Product 4 - 200 Product 5 - 1,500

And I would like to display only the values of the top 3 products (Products 5, 3, and 2).

Product	Date	Value
Product 2	2022-12-01	200
Product 2	2022-12-03	500
Product 3	2022-12-04	300
Product 3	2022-12-08	600
Product 5	2022-12-01	700
Product 5	2022-12-10	800

I used to check first the product with the highest sum of count so I could use the result as a filter on my table. But I'd like to use 1 SQL query only instead of running 2 separate queries.

SELECT product, count(value) as prod_count
FROM product
GROUP BY product
ORDER BY prod_count
LIMIT 3

CodePudding user response：

We can SUM (instead of COUNT) the value and GROUP BY the product.

Here we can use FETCH FIRST 3 ROWS WITH TIES to find for example two products having the identic 3rd highest sum.

So the entire query will be this one:

SELECT product, date, value 
FROM product 
WHERE product IN 
(SELECT product
FROM product
GROUP BY product
ORDER BY SUM(value) DESC 
FETCH FIRST 3 ROWS WITH TIES)
ORDER BY product, date;

We should mention the column/table naming should be improved if possible because having the same table name and column name "product" causes bad readability.

Furthermore the column "date" (wich is actually a SQL key word) should better be renamed to something more meaningful like for example "sellDate", same for the column "value".

Anyway, let's assume there is another product "product 6" which has the same sumed value (700) as product 2.

Then the above query will produce this outcome:

Product	Date	Value
Product 2	2022-12-01	200
Product 2	2022-12-03	500
Product 3	2022-12-04	300
Product 3	2022-12-08	600
Product 5	2022-12-01	700
Product 5	2022-12-10	800
Product 6	2022-12-01	600
Product 6	2022-12-10	100

If it's not intended to show four products (or more if more have the same 3rd highest sum of value), we can just use LIMIT 3 instead:

SELECT product, date, value 
FROM product 
WHERE product IN 
(SELECT product
FROM product
GROUP BY product
ORDER BY SUM(value) DESC 
LIMIT 3)
ORDER BY product, date;

So we will get only three products again, one of those having the sumed value 700 (here product 2) will not be selected.

So the result of this query would be this:

Product	Date	Value
Product 3	2022-12-04	300
Product 3	2022-12-08	600
Product 5	2022-12-01	700
Product 5	2022-12-10	800
Product 6	2022-12-01	600
Product 6	2022-12-10	100

Or if we even want to say product 2 should be found instead of product 6, we can add the product to the ORDER BY clause:

SELECT product, date, value 
FROM product 
WHERE product IN 
(SELECT product
FROM product
GROUP BY product
ORDER BY SUM(value) DESC, product 
LIMIT 3)
ORDER BY product, date;

This will be the outcome for this query:

Product	Date	Value
Product 2	2022-12-01	200
Product 2	2022-12-03	500
Product 3	2022-12-04	300
Product 3	2022-12-08	600
Product 5	2022-12-01	700
Product 5	2022-12-10	800

We can try out here: db<>fiddle

CodePudding user response：

You could use DENSE_RANK function within an aggregated query joined to your table as the following:

Select P.Product, P.Date, P.Value
From Product P Join
(
  Select Product,
         DENSE_RANK() Over (Order By Sum(Value) Desc) rn
  From Product
  Group By Product
) T
On P.Product = T.Product
Where T.rn <= 3
Order By P.Product, P.Date

See a demo.