SQL delete or keep only one occurrence of the row in table-CodePudding

I have table which has only two columns:

Message_id
Id (primary key)

I want to delete the duplicate from the table but keep the first occurrence or any one occurrence of the row.

The duplicate is determined by using Message_id.

I use this query to find all the duplicate values

(select message_id
 from myDb.My_Tb
 group by message_id 
 having count(message_id) > 1)

I used this below query to get all the row_numbers but the problem is if I create this as another table and use MOD(r,2) = 0 . It doesn't gives me the exact 2nd row but a random 2nd row.

(select row_number() over() as rn, id, message_id 
 from MyDb.My_Tb
 where message_id in (select message_id
                      from MyDb.My_Tb
                      group by message_id 
                      having count(message_id) > 1)
 order by message_id asc)

Thanks for the responses but I was able to do this with this query

delete from MyDb.My_Tb where id in
(select id  from 
(select row_number () over() as rownum, id , message_id
from

(
    (select id ,  message_id from MyDb.My_Tb
     where message_id in

        (select message_id
        from MyDb.My_Tb 
        group by message_id 
        having count(message_id)>1)

     order by message_id asc) 
) as dummy
) as dummy1

where mod(rownum,2) = 0)

because in my case I had only duplicates in numbers of two. Is there a way that we can generalise this for 'N' duplicates?

CodePudding user response：

Use this query to delete all the duplicate records except one:

DELETE FROM myDb.My_Tb
WHERE id NOT IN (
  SELECT MIN(id)
  FROM myDb.My_Tb
  GROUP BY Message_id
)

Or you can use:

DELETE FROM myDb.My_Tb t1
WHERE EXISTS (
    SELECT *
    FROM myDb.My_Tb t2
    WHERE t1.Message_id = t2.Message_id AND t1.id > t2.id
);

CodePudding user response：

Yes, you can modify the query to delete the duplicate rows and keep the first occurrence for any number of duplicates.

To do this, you can use the following query:

DELETE FROM MyDb.My_Tb
WHERE Id IN (
    SELECT Id
    FROM (
        SELECT ROW_NUMBER() OVER (PARTITION BY Message_id ORDER BY Id) AS RowNumber, Id
        FROM MyDb.My_Tb
    ) AS t
    WHERE RowNumber > 1
);

This query uses the ROW_NUMBER() function to assign a unique number to each row within each Message_id group. It then selects all rows with a RowNumber greater than 1, and deletes them from the table. This will retain the first occurrence of each duplicate Message_id value, and delete all subsequent occurrences.