I have this dataset where there are rows with duplicate personIDs but different pcpgrouper2. I only want to exclude "Unknown" where there are more than 1 of the same PersonID and one of them is "Unknown". If the ID has only one row with "Unknown" then keep that row.
Name | PersonID | PCPGrouper |
---|---|---|
ABBAN, AVRIL | 1094893 | Unknown |
ABBIS,CHLOE | 1114294 | T Docs |
ABBIS,CHLOE | 1114294 | Unknown |
CodePudding user response:
We can try using the following SQL query (assuming you are using SQL):
SELECT Name, PersonID, PCPGrouper
FROM yourTable t1
WHERE PCPGrouper != 'Unknown' OR
NOT EXISTS (
SELECT 1
FROM yourTable t2
WHERE t2.PersonID = t1.PersonID AND
t2.PCPGrouper != 'Unknown'
);
The above logic retains any person record whose grouper value is not unknown or is unknown but there is no other value.