Home > Software engineering >  Removing near identical values from mysql table
Removing near identical values from mysql table

Time:11-30

Is there a way of removing near identical values from a table in mysql? My table has records more than 10K out of which one of the company looks like this:

id    name
123   Vianet Group Inc
5214  Vianet Group, Inc.

on using describe tablename I get this:

Field Type Null Key Default Extra
id  int NO  PRI     auto_increment
name    varchar(150)    NO  UNI     

the names of the company are same however I would like to delete the second instance from table, thereby keeping just a single instance of the name in the table. This is just one instance and there are others like these.. Is there a swift way of removing identical values from the column? Please help.

CodePudding user response:

You could try using soundex to find the "near identical" values -

SELECT *
FROM tablename t1
INNER JOIN tablename t2
    ON t1.id < t2.id
    AND SOUNDEX(t1.name) = SOUNDEX(t2.name)

You will need to test it with some of your example "near identical" values to see what it does and does not work for. As suggested by Akina you will probably need to go for some kind of normalisation process (stored function) or the Levenshtein distance function linked by Slava.

  • Related