I am using the below to delete duplicate rows in my table as shown:
with a as(
(select id, date_published from public.dat order by 1)
except all
select distinct on (id, date_published)
id, date_published from public.dat order by 1)
delete from public.dat using a where dat.id = a.id returning *;
So, that might identify duplicate values like the below:
enter code here
ID DATE_PUBLISHED DATE_LAST_MODIFIED
007 2019-12-11 13:15:00 2019-09-19 13:40:00
007 2019-12-11 13:15:00 2022-07-11 21:15:00
However I actually don't want to delete everything that is returned.
I only want to delete the rows that have the oldest last modified date.
In this case, I want to delete:
ID DATE_PUBLISHED DATE_LAST_MODIFIED
007 2019-12-11 13:15:00 2019-09-19 13:40:00
or any other duplicates keep the most recent row.
The SQL I've used goes through the whole table and and deletes all duplicates, which is not my goal.
Can anyone help me with a means by which I can perhaps "tweak" this so it's smart enough to identify the max DATE_LAST_MODIFIED and delete anything else that has the same / duplicate ID?
CodePudding user response:
Try the following:
DELETE FROM
dat T USING dat D
WHERE T.id = D.id AND
T.date_last_modified < D.date_last_modified
See a demo.
You may also use EXISTS
as the following:
DELETE FROM
dat T WHERE EXISTS (SELECT 1 FROM dat D
WHERE T.id = D.id AND
T.date_last_modified < D.date_last_modified
)
See a demo.