Home > OS >  Exclude rows where ID is duplicated but 3rd column has "unknown
Exclude rows where ID is duplicated but 3rd column has "unknown

Time:09-09

I have this dataset where there are rows with duplicate personIDs but different pcpgrouper2. I only want to exclude "Unknown" where there are more than 1 of the same PersonID and one of them is "Unknown". If the ID has only one row with "Unknown" then keep that row.

Name PersonID PCPGrouper
ABBAN, AVRIL 1094893 Unknown
ABBIS,CHLOE 1114294 T Docs
ABBIS,CHLOE 1114294 Unknown

CodePudding user response:

We can try using the following SQL query (assuming you are using SQL):

SELECT Name, PersonID, PCPGrouper
FROM yourTable t1
WHERE PCPGrouper != 'Unknown' OR
      NOT EXISTS (
          SELECT 1
          FROM yourTable t2
          WHERE t2.PersonID = t1.PersonID AND
                t2.PCPGrouper != 'Unknown'
      );

The above logic retains any person record whose grouper value is not unknown or is unknown but there is no other value.

  • Related