Suppose I have a table called t
, which is like
id content time
1 'a' 100
1 'a' 101
1 'b' 102
2 'c' 200
2 'c' 201
id
are duplicate, and for the same id
, content could also be duplicate. Now I want to select for each id the rows with max timestamp, which would be
id content time
1 'b' 102
2 'c' 201
And this is my current solution:
select t1.id, t1.content, t1.time
from (
select id, content, time from t
) as t1
right join (
select id, max(time) as time from t group by id
) as t2
on t1.id = t2.id and t1.time = t2.time;
But this looks inefficient to me. Because theoretically when select id, max(time) as time from t group by id
is executed, the rows I want have already been located. The right join
brings extra O(n^2) time cost, which seems unnecessary.
So is there any more efficient way to do it, or anything that I missunderstand?
CodePudding user response:
Use DISTINCT ON
:
SELECT DISTINCT ON (id) id, content, time
FROM yourTable
ORDER BY id, time DESC;
On Postgres, this is usually the most performant way to write your query, and it should outperform ROW_NUMBER
and other approaches.
CodePudding user response:
Try this
SELECT a.id, a.content, a.time FROM t AS a
INNER JOIN (
SELECT a.content, MAX(a.time) AS time FROM t
GROUP BY a.content
) AS b ON a.content = b.content AND a.time = b.time