Remove duplicated records in mysql-CodePudding

I have a table called leads with duplicated records

Leads: *account_id *campaign_id

I want to remove all the duplicated account_id with campaign_id equal to "51"

So if I have for example account_id = 1991 two times in the table I want to remove the one with campaign_id equal to "51" and keep the other one.

CodePudding user response：

You could use a delete join:

DELETE t1
FROM yourTable t1
INNER JOIN yourTable t2
    ON t2.account_id = t1.account_id AND
       t2.campaign_id <> 51
WHERE
    t1.campaign_id = 51;

CodePudding user response：

There's no problem to delete from a table provided that:

You use the correct syntax.
You have done a backup of the table BEFORE you do any deleting.

However, I would suggest a different method:

Create a new table based on the existing table:

CREATE TABLE mytable_new LIKE mytable;

Add unique constraint (or PRIMARY KEY) on column(s) you don't want to have duplicates:

ALTER TABLE mytable_new ADD UNIQUE(column1,[column2]);

Note: if you want to identify a combination of two (or more) columns as unique, place all the column names in the UNIQUE() separated by comma. Maybe in your case, the constraint would be UNIQUE(account_id, campaign_id).

Insert data from original table to new table:

INSERT IGNORE INTO mytable_new SELECT * FROM mytable;

Note: the IGNORE will insert only non-duplicate values that match with the UNIQUE() constraint. If you have an app that runs a MySQL INSERT query to the table, you have to update the query by adding IGNORE.

Check data consistency and once you're satisfied, rename both tables:

RENAME TABLE mytable TO mytable_old;
RENAME TABLE mytable_new TO mytable;

The best thing about this is that in case that you sense there's something wrong with the new table, you still have the original table. While changing the name of the tables only take less than a second, the probable issue here is that it might take a while to do the INSERT IGNORE if you have a large data.

Demo fiddle