I have a table:
id | gender | county | dob |
---|---|---|---|
001 | male | USA | NULL |
001 | male | NULL | NULL |
002 | female | FRANCE | NULL |
002 | NULL | NULL | 28/01/1990 |
There are lots of duplicates that I'd like to remove based on column id. I want to keep the record with the least missing value, and at the same time fill missing values from other duplicated records.
Expected query output:
id | gender | county | dob |
---|---|---|---|
001 | male | USA | NULL |
002 | female | FRANCE | 28/01/1990 |
How can I do this in MySQL?
CodePudding user response:
You may try aggregating by the id
and taking the max of all other columns:
SELECT id, MAX(gender) AS gender, MAX(country) AS country, MAX(dob) AS dob
FROM yourTable
GROUP BY id;
CodePudding user response:
SELECT CONCAT('SELECT id,',
GROUP_CONCAT(CONCAT('MAX(',
COLUMN_NAME,
') ',
COLUMN_NAME)
ORDER BY ORDINAL_POSITION),
' FROM test GROUP BY id')
INTO @sql
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME = 'test'
AND COLUMN_NAME <> 'id';
PREPARE stmt FROM @sql;
EXECUTE stmt;
DROP PREPARE stmt;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c7deccefdcbaf2422bc177fb349ad080
Use this code in a batch (from the program - enable multi-query) or in stored procedure.
Replace id
with real identifying column name (3 places).
Replace test
with real table name (2 places).
Remove ORDER BY ORDINAL_POSITION
if you do not need the output columns order to match the order in source table (but id
will be always the first).
Replace DATABASE()
with definite database name if needed.
You may use any valid name instead of @sql
and stmt
.