Show only the max id record outa of a GROUPED BY list-CodePudding

(Simplified example). I have this table:

USER ID |  code   |   sales

 1          x          100
 1          y          200
 1          z          150

 2          x          300
 2          z          17
 2          y          100

For each user, I need to show just the record with the max sales.

USER ID |  code   |   sales

 1          y          200
 2          x          300

In the real table I have, It is millions of records. What would be the most efficient way of doing this?

Right now I find for each user the max sales record and join it unto itself to find the full record.

SELECT * from T
WHERE sales = (SELECT MAX(sales) FROM T WHERE user_id=outr.user_id)

Let's assume that sales number does not repeat itself for a specific user.

CodePudding user response：

Here's my attempt:

SELECT  T_with_maxsales.user_id,
        T_with_maxsales.code,
        T_with_maxsales.sales
FROM (SELECT T.*,
             MAX(sales) OVER(PARTITION BY user_id) AS max_sales
      FROM T) T_with_maxsales
WHERE sales = max_sales

CodePudding user response：

Refactor your dependent subquery into an independent subquery and then JOIN it.

SELECT T.* 
  FROM T
  JOIN (
          SELECT MAX(sales) max_sales, user_id
            FROM T
           GROUP BY user_id
       ) M   ON T.user_id = M.user_id
            AND T.sales = M.max_sales;

The subquery gets the largest sale for each user it. The JOIN operation's ON clause retrieves the detail record for each user's largest sale.

A multicolumn index on (user_id, sales) will make this query work efficiently when running on a large table.

This works on both mysql and postgresql.

CodePudding user response：

With Postgres the most efficient way is typically using distinct on()

select distinct on (user_id) *
from the_table
order by user_id, sales desc;