I am using Postgres v12, and I have a table with duplicated rows. I need to retrieve only the last entry for each duplicate, ignoring entries which have no duplicate.
This table has the following columns:
- id (unique)
- request_id (where to find the duplicates)
- created_at (where to see which entry is the latest)
id | request_id | created_at |
---|---|---|
1 | a | 2020.06.06 |
2 | a | 2020.05.05 |
3 | b | 2020.04.04 |
4 | b | 2020.03.03 |
5 | c | 2020.04.04 |
6 | c | 2020.03.03 |
7 | d | 2020.03.03 |
The query should retrieve rows with id 1,3,5 , since they are the latest entry (created_at) of each duplicate. ID 7 has no duplicate, so it is ignored.
I have tried with the solution proposed here: https://www.geeksengine.com/article/get-single-record-from-duplicates.html but due to be using Postgres v12, those queries do not work, I get the error "column must appear in the group by clause" which is another problem cited here: must appear in the GROUP BY clause or be used in an aggregate function
I have been searching for a solution for days to this problem, but I am not an SQL expert. I would appreciate any help very much.
CodePudding user response:
here is one way using window functions :
select * from (
select *
, row_number() over (partition by request_id order by created_at desc) as rn
, count() over (partition by request_id) cn
from tablename
) t where cn > 1 and rn = 1