I have table which has only two columns:
Message_id
Id
(primary key)
I want to delete the duplicate from the table but keep the first occurrence or any one occurrence of the row.
The duplicate is determined by using Message_id
.
I use this query to find all the duplicate values
(select message_id
from myDb.My_Tb
group by message_id
having count(message_id) > 1)
I used this below query to get all the row_numbers but the problem is if I create this as another table and use MOD(r,2) = 0 . It doesn't gives me the exact 2nd row but a random 2nd row.
(select row_number() over() as rn, id, message_id
from MyDb.My_Tb
where message_id in (select message_id
from MyDb.My_Tb
group by message_id
having count(message_id) > 1)
order by message_id asc)
Thanks for the responses but I was able to do this with this query
delete from MyDb.My_Tb where id in
(select id from
(select row_number () over() as rownum, id , message_id
from
(
(select id , message_id from MyDb.My_Tb
where message_id in
(select message_id
from MyDb.My_Tb
group by message_id
having count(message_id)>1)
order by message_id asc)
) as dummy
) as dummy1
where mod(rownum,2) = 0)
because in my case I had only duplicates in numbers of two. Is there a way that we can generalise this for 'N' duplicates?
CodePudding user response:
Use this query to delete all the duplicate records except one:
DELETE FROM myDb.My_Tb
WHERE id NOT IN (
SELECT MIN(id)
FROM myDb.My_Tb
GROUP BY Message_id
)
Or you can use:
DELETE FROM myDb.My_Tb t1
WHERE EXISTS (
SELECT *
FROM myDb.My_Tb t2
WHERE t1.Message_id = t2.Message_id AND t1.id > t2.id
);
CodePudding user response:
Yes, you can modify the query to delete the duplicate rows and keep the first occurrence for any number of duplicates.
To do this, you can use the following query:
DELETE FROM MyDb.My_Tb
WHERE Id IN (
SELECT Id
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Message_id ORDER BY Id) AS RowNumber, Id
FROM MyDb.My_Tb
) AS t
WHERE RowNumber > 1
);
This query uses the ROW_NUMBER()
function to assign a unique number to each row within each Message_id group. It then selects all rows with a RowNumber greater than 1, and deletes them from the table. This will retain the first occurrence of each duplicate Message_id value, and delete all subsequent occurrences.