Home > Enterprise >  SQL delete or keep only one occurrence of the row in table
SQL delete or keep only one occurrence of the row in table

Time:12-21

I have table which has only two columns:

  1. Message_id
  2. Id (primary key)

I want to delete the duplicate from the table but keep the first occurrence or any one occurrence of the row.

The duplicate is determined by using Message_id.

I use this query to find all the duplicate values

(select message_id
 from myDb.My_Tb
 group by message_id 
 having count(message_id) > 1)

I used this below query to get all the row_numbers but the problem is if I create this as another table and use MOD(r,2) = 0 . It doesn't gives me the exact 2nd row but a random 2nd row.

(select row_number() over() as rn, id, message_id 
 from MyDb.My_Tb
 where message_id in (select message_id
                      from MyDb.My_Tb
                      group by message_id 
                      having count(message_id) > 1)
 order by message_id asc)

Thanks for the responses but I was able to do this with this query

delete from MyDb.My_Tb where id in
(select id  from 
(select row_number () over() as rownum, id , message_id
from

(
    (select id ,  message_id from MyDb.My_Tb
     where message_id in

        (select message_id
        from MyDb.My_Tb 
        group by message_id 
        having count(message_id)>1)

     order by message_id asc) 
) as dummy
) as dummy1

where mod(rownum,2) = 0)

because in my case I had only duplicates in numbers of two. Is there a way that we can generalise this for 'N' duplicates?

CodePudding user response:

Use this query to delete all the duplicate records except one:

DELETE FROM myDb.My_Tb
WHERE id NOT IN (
  SELECT MIN(id)
  FROM myDb.My_Tb
  GROUP BY Message_id
)

Or you can use:

DELETE FROM myDb.My_Tb t1
WHERE EXISTS (
    SELECT *
    FROM myDb.My_Tb t2
    WHERE t1.Message_id = t2.Message_id AND t1.id > t2.id
);

CodePudding user response:

Yes, you can modify the query to delete the duplicate rows and keep the first occurrence for any number of duplicates.

To do this, you can use the following query:

DELETE FROM MyDb.My_Tb
WHERE Id IN (
    SELECT Id
    FROM (
        SELECT ROW_NUMBER() OVER (PARTITION BY Message_id ORDER BY Id) AS RowNumber, Id
        FROM MyDb.My_Tb
    ) AS t
    WHERE RowNumber > 1
);

This query uses the ROW_NUMBER() function to assign a unique number to each row within each Message_id group. It then selects all rows with a RowNumber greater than 1, and deletes them from the table. This will retain the first occurrence of each duplicate Message_id value, and delete all subsequent occurrences.

  • Related