Home > database >  Keep only first row given a set of columns in PostgreSQL
Keep only first row given a set of columns in PostgreSQL

Time:04-17

I have a table that looks like this:

id creation_date col1 col2 col3
1 21 April a b e
2 23 April a b e
3 21 April c d e

My goal is to delete all the duplicates, keeping only the first (ie first created) row.

The duplicates are defined using a set of columns from the table. For example, using col1 and col2, the result will be :

id creation_date col1 col2 col3
1 21 April a b e
3 21 April c d e

One answer that I found use one column as a key :

DELETE FROM dups a USING (
      SELECT MIN(ctid) as ctid, key
        FROM dups 
        GROUP BY key HAVING COUNT(*) > 1
      ) b
      WHERE a.key = b.key 
      AND a.ctid <> b.ctid

, but I can't see how to adapt to the case where multiple columns are used, and where the earliest record should be kept.

CodePudding user response:

You can try to use DELETE and EXISTS with your logic in subquery condition.

DELETE 
FROM dups t
WHERE EXISTS (SELECT 1 
               FROM dups tt
               WHERE tt.col1 =  t.col1
               AND tt.col2 =  t.col2
               AND tt.col3 =  t.col3
               HAVING MIN(tt.ctid) <> t.ctid
               ); 

sqlfiddle

CodePudding user response:

You can use an implicit self join of the table:

DELETE FROM dups d1
USING dups d2
WHERE (d2.col1, d2.col2, d2.col3) = (d1.col1, d1.col2, d1.col3)
  AND d2.creation_date < d1.creation_date;

See the demo.

  • Related