Home > Enterprise >  MySql remove rows which have duplicate column content
MySql remove rows which have duplicate column content

Time:11-02

I get database table which contain postal numbers and regions for my country. That table have all information but i need to change it for my purpose.

I need to eliminate all rows that have duplicate content in specific column.

Check screenshot to see result enter image description here

I want to remove all duplicate rows which have postanski_broj (postal_number) the some. That number need to be unique. I try manualy to set that column to unique but i get duplicate entry when i try to execute statment.

  • ID is primary key with auto increment.
  • postanski_broj column is VARCHAR which represent postal_code
  • naselje column is VARCHAR which represent region

One region can have one postal_code

I try

ALTER  TABLE poste ADD UNIQUE INDEX idx_postanski_br (postanski_broj);

00:03:20 ALTER TABLE poste ADD UNIQUE INDEX idx_postanski_br (postanski_broj) Error Code: 1062. Duplicate entry '11158' for key 'idx_postanski_br' 0.118 sec

ALTER IGNORE TABLE poste ADD UNIQUE INDEX idx_postanski_br (postanski_broj);

00:04:17 ALTER IGNORE TABLE poste ADD UNIQUE INDEX idx_postanski_br (postanski_broj) Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IGNORE TABLE poste ADD UNIQUE INDEX idx_postanski_br (postanski_broj)' at line 1 0.00037 sec

Anyone have sugestion? Thanks

CodePudding user response:

If you have other columns with different values than the ones you've shown there (except for id), deleting should be your last choice.

I usually would duplicate the table first:

CREATE TABLE poste_new LIKE poste;

add unique index to the newly created poste_new table:

ALTER  TABLE poste_new ADD UNIQUE INDEX idx_postanski_br (postanski_broj);

insert the data from poste into poste_new with IGNORE option to skip duplicates based on the unique index:

INSERT IGNORE INTO poste_new SELECT * FROM poste;

rename the tables:

RENAME TABLE poste TO poste_old;
RENAME TABLE poste_new TO poste;

The good thing about this is that you've minimized the risk of wrong delete and if you're not satisfied with the new table, you still have the old table intact - effectively making it a backup.

CodePudding user response:

This solution can take too much time for big tables. Best way of solving this is: Remove duplicate rows in MySQL

You have to delete the rows before applying the unique constraint. Be careful applying this:

DELETE p1 FROM poste p1
INNER JOIN poste p2 
WHERE 
    p1.id < p2.id AND 
    p1.postanski_broj = p2.postanski_broj;

This should remove the duplicated ones and will keep only the ones with the higher id (id=168044 in your example).

  • Related