Home > Mobile >  PostgreSQL: how to delete duplicated rows grouped by the value of a column?
PostgreSQL: how to delete duplicated rows grouped by the value of a column?

Time:06-09

Given the following table, I need to delete every row corresponding to a certain "id" whenever all these rows are duplicated in a successive "id". Note that the deletion all rows for a specific "id" should happen only in case that every row between the two ids match (with the exception of the different "id" column).

id name subject score
1 Ann Maths 9
1 Ann History 8
2 Ann Maths 9
2 Ann History 8
3 Ann Maths 9
3 Ann History 7
4 Bob Maths 8
4 Bob History 8

For this specific input, the updated output table should be:

id name subject score
1 Ann Maths 9
1 Ann History 8
3 Ann Maths 9
3 Ann History 7
4 Bob Maths 8
4 Bob History 8

This because all records between id 1 and 2 are the exactly the same. This doesn't apply for "id" 1 and 3, as long as there's at least one row not in common between the two (id 1 has 8 in History while id 3 has 7 in the same subject).

So it is not as simple as deleting duplicated rows. Here's my attempt:

DELETE FROM  table a
USING table b
WHERE a.name = b.name
AND a.subject = b.subject
AND a.score = b.score
AND a.ID < b.ID;

Can you help me?

CodePudding user response:

You can first get all ids that shouldn't be deleted and then exclude them in the WHERE clause of the DELETE statement.

Step 1. In order to match unique ids that are not repeated for all rows, you can use PostgreSQL DISTINCT ON construct, that will allows you to get every row that is not duplicated on the fields "name", "subject", "score". Then retrieve these ids only once with a simple DISTINCT.

SELECT DISTINCT id 
FROM (SELECT DISTINCT ON (name, subject, score) id
      FROM tab
      ORDER BY name, subject, score, id) ids_to_keep

Step 2. Hence you can build the DELETE statement using the NOT IN operator inside the WHERE clause:

DELETE FROM tab
WHERE id NOT IN (
    SELECT DISTINCT id 
    FROM (SELECT DISTINCT ON (name, subject, score) id
          FROM tab
          ORDER BY name, subject, score, id) ids_to_keep
);

Check the demo here.

  • Related