How to delete duplicate rows in postgresql but retain the rows that reflect the most recently modifi-CodePudding

I am using the below to delete duplicate rows in my table as shown:

with a as(
(select id, date_published from public.dat order by 1)
except all
select distinct on (id, date_published)
id, date_published from public.dat order by 1)
delete from public.dat using a where dat.id = a.id returning *;

So, that might identify duplicate values like the below:

enter code here
ID     DATE_PUBLISHED          DATE_LAST_MODIFIED
007    2019-12-11 13:15:00     2019-09-19 13:40:00
007    2019-12-11 13:15:00     2022-07-11 21:15:00

However I actually don't want to delete everything that is returned.

I only want to delete the rows that have the oldest last modified date.

In this case, I want to delete:

    ID     DATE_PUBLISHED          DATE_LAST_MODIFIED
    007    2019-12-11 13:15:00     2019-09-19 13:40:00

or any other duplicates keep the most recent row.

The SQL I've used goes through the whole table and and deletes all duplicates, which is not my goal.

Can anyone help me with a means by which I can perhaps "tweak" this so it's smart enough to identify the max DATE_LAST_MODIFIED and delete anything else that has the same / duplicate ID?

CodePudding user response：

Try the following:

DELETE FROM
  dat T USING dat D
  WHERE T.id = D.id AND
        T.date_last_modified < D.date_last_modified

See a demo.

You may also use EXISTS as the following:

DELETE FROM
dat T WHERE EXISTS (SELECT 1 FROM dat D
                    WHERE T.id = D.id AND
                          T.date_last_modified < D.date_last_modified
                   )

See a demo.