I have a table with partial duplicate rows and I want to use the partial matches across the rows to set a value to NULL.
In this particular case, if the 'Name', 'Age', and 'Profession' columns are the same and the 'School' column is different, I want to set the 'School' column to NULL.
This is what my table looks like now:
Name Age Profession Salary School
John 21 Teacher 50,000 A
Lisa 24 Engineer 75,000 B
John 21 Teacher 55,000 C
This is what I want it to look like:
Name Age Profession Salary School
John 21 Teacher 50,000 NULL
Lisa 24 Engineer 75,000 B
John 21 Teacher 55,000 NULL
How can I create a query that does this? Thanks!
CodePudding user response:
Ideally there should be some kind of primary key in your table. That lacking, we can try the following update join approach:
UPDATE yourTable t1
INNER JOIN
(
SELECT Name, Age, Profession
FROM yourTable
GROUP BY Name, Age, Profession
HAVING COUNT(*) > 1
) t2
ON t2.Name = t1.Name AND
t2.Age = t1.Age AND
t2.Profession = t1.Profession
SET
School = NULL;
CodePudding user response:
The table joins itself and compare the required conditions
UPDATE mytable a
JOIN mytable b
ON a.Name = b.Name
AND a.Age = b.Age
AND a.Profession = b.Profession
AND COALESCE(a.School, '') != COALESCE(b.School, '')
SET a.School = NULL
;
Testing Set:
CREATE TABLE mytable (
`Name` VARCHAR(100),
`Age` INT,
`Profession` VARCHAR(100),
`Salary` INT,
`School` VARCHAR(100)
);
INSERT INTO mytable VALUES
( 'John', 21, 'Teacher', 50000, 'A' ),
( 'John', 21, 'Teacher', 50000, 'A' ),
( 'Lisa', 24, 'Engineer', 75000, 'B' ),
( 'Lisa', 24, 'Engineer', 75000, NULL ),
( 'Lisa1', 24, 'Engineer', 75000, 'B' ),
( 'Lisa1', 24, 'Engineer', 75000, 'B' ),
( 'Lisa1', 24, 'Engineer', 75000, 'B' ),
( 'John', 21, 'Teacher', 55000, 'C' )
;
CodePudding user response:
SELECT Name, Age, Profession, Salary,
CASE WHEN COUNT(School) OVER (PARTITION BY Name, Age, Profession) = 1
THEN School
END AS School
FROM sourcetable