Home > Mobile >  SQL remove duplicates - keep the row with least missing value, and fill missing value from duplicate
SQL remove duplicates - keep the row with least missing value, and fill missing value from duplicate

Time:12-17

I have a table:

id gender county dob
001 male USA NULL
001 male NULL NULL
002 female FRANCE NULL
002 NULL NULL 28/01/1990

There are lots of duplicates that I'd like to remove based on column id. I want to keep the record with the least missing value, and at the same time fill missing values from other duplicated records.

Expected query output:

id gender county dob
001 male USA NULL
002 female FRANCE 28/01/1990

How can I do this in MySQL?

CodePudding user response:

You may try aggregating by the id and taking the max of all other columns:

SELECT id, MAX(gender) AS gender, MAX(country) AS country, MAX(dob) AS dob
FROM yourTable
GROUP BY id;

CodePudding user response:

SELECT CONCAT('SELECT id,',
              GROUP_CONCAT(CONCAT('MAX(',
                                  COLUMN_NAME,
                                  ') ',
                                  COLUMN_NAME)
                           ORDER BY ORDINAL_POSITION),
              ' FROM test GROUP BY id')
INTO @sql
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 'test'
  AND COLUMN_NAME <> 'id';

PREPARE stmt FROM @sql;
EXECUTE stmt;
DROP PREPARE stmt;

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c7deccefdcbaf2422bc177fb349ad080

Use this code in a batch (from the program - enable multi-query) or in stored procedure.

Replace id with real identifying column name (3 places).

Replace test with real table name (2 places).

Remove ORDER BY ORDINAL_POSITION if you do not need the output columns order to match the order in source table (but id will be always the first).

Replace DATABASE() with definite database name if needed.

You may use any valid name instead of @sql and stmt.

  • Related