I have a table that looks like this:
id | creation_date | col1 | col2 | col3 |
---|---|---|---|---|
1 | 21 April | a | b | e |
2 | 23 April | a | b | e |
3 | 21 April | c | d | e |
My goal is to delete all the duplicates, keeping only the first (ie first created) row.
The duplicates are defined using a set of columns from the table. For example, using col1 and col2, the result will be :
id | creation_date | col1 | col2 | col3 |
---|---|---|---|---|
1 | 21 April | a | b | e |
3 | 21 April | c | d | e |
One answer that I found use one column as a key :
DELETE FROM dups a USING (
SELECT MIN(ctid) as ctid, key
FROM dups
GROUP BY key HAVING COUNT(*) > 1
) b
WHERE a.key = b.key
AND a.ctid <> b.ctid
, but I can't see how to adapt to the case where multiple columns are used, and where the earliest record should be kept.
CodePudding user response:
You can try to use DELETE
and EXISTS
with your logic in subquery condition.
DELETE
FROM dups t
WHERE EXISTS (SELECT 1
FROM dups tt
WHERE tt.col1 = t.col1
AND tt.col2 = t.col2
AND tt.col3 = t.col3
HAVING MIN(tt.ctid) <> t.ctid
);
CodePudding user response:
You can use an implicit self join of the table:
DELETE FROM dups d1
USING dups d2
WHERE (d2.col1, d2.col2, d2.col3) = (d1.col1, d1.col2, d1.col3)
AND d2.creation_date < d1.creation_date;
See the demo.