Home > Software design >  most efficient way to select duplicate rows with max timestamp
most efficient way to select duplicate rows with max timestamp

Time:03-17

Suppose I have a table called t, which is like

id  content  time
1     'a'     100
1     'a'     101
1     'b'     102
2     'c'     200
2     'c'     201

id are duplicate, and for the same id, content could also be duplicate. Now I want to select for each id the rows with max timestamp, which would be

id  content  time
1      'b'    102
2      'c'    201

And this is my current solution:

select t1.id, t1.content, t1.time 
from (
  select id, content, time from t 
) as t1 
right join (
  select id, max(time) as time from t group by id
) as t2 
on t1.id = t2.id and t1.time = t2.time;

But this looks inefficient to me. Because theoretically when select id, max(time) as time from t group by id is executed, the rows I want have already been located. The right join brings extra O(n^2) time cost, which seems unnecessary.

So is there any more efficient way to do it, or anything that I missunderstand?

CodePudding user response:

Use DISTINCT ON:

SELECT DISTINCT ON (id) id, content, time
FROM yourTable
ORDER BY id, time DESC;

On Postgres, this is usually the most performant way to write your query, and it should outperform ROW_NUMBER and other approaches.

CodePudding user response:

Try this

SELECT a.id, a.content, a.time FROM t AS a
INNER JOIN (
    SELECT a.content, MAX(a.time) AS time FROM t
    GROUP BY a.content
) AS b ON a.content = b.content AND a.time = b.time
  • Related