SQL remove duplicates - keep the row with least missing value, and fill missing value from duplicate-CodePudding

I have a table:

id	gender	county	dob
001	male	USA	NULL
001	male	NULL	NULL
002	female	FRANCE	NULL
002	NULL	NULL	28/01/1990

There are lots of duplicates that I'd like to remove based on column id. I want to keep the record with the least missing value, and at the same time fill missing values from other duplicated records.

Expected query output:

id	gender	county	dob
001	male	USA	NULL
002	female	FRANCE	28/01/1990

How can I do this in MySQL?

CodePudding user response：

You may try aggregating by the id and taking the max of all other columns:

SELECT id, MAX(gender) AS gender, MAX(country) AS country, MAX(dob) AS dob
FROM yourTable
GROUP BY id;

CodePudding user response：

SELECT CONCAT('SELECT id,',
              GROUP_CONCAT(CONCAT('MAX(',
                                  COLUMN_NAME,
                                  ') ',
                                  COLUMN_NAME)
                           ORDER BY ORDINAL_POSITION),
              ' FROM test GROUP BY id')
INTO @sql
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 'test'
  AND COLUMN_NAME <> 'id';

PREPARE stmt FROM @sql;
EXECUTE stmt;
DROP PREPARE stmt;

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c7deccefdcbaf2422bc177fb349ad080

Use this code in a batch (from the program - enable multi-query) or in stored procedure.

Replace id with real identifying column name (3 places).

Replace test with real table name (2 places).

Remove ORDER BY ORDINAL_POSITION if you do not need the output columns order to match the order in source table (but id will be always the first).

Replace DATABASE() with definite database name if needed.

You may use any valid name instead of @sql and stmt.