Home > Mobile >  reconciling duplicates in one column with various values in another column
reconciling duplicates in one column with various values in another column

Time:12-08

I have a df that operates like this:

ID <- c("A", "B", "B", "C", "D", "D", "E", "F")
FLD<-c("GOOD","BAD","TERRIBLE", "BAD", "BAD","NA", "TERRIBLE", "NA")

df<-data.frame(ID, FLD)

I need to get it down so there's only rows with unique ID's. If ID's are duplicated, and the FLD value for one duplicate is BAD and one value is TERRIBLE, I want the one that says TERRIBLE to remain and the one that says BAD to drop. If, however, in the case of ID "D", the FLD values are BAD and NA, I want to keep BAD.

I'm not sure how to deal with it, because the actual dataset has tens of thousands of entries, so I can't just fix it by looking. Basically the dummy script I want is along the lines of:

If duplicated, and one duplicate is BAD is TERRIBLE, drop the row with BAD. But if one duplicate is BAD and one is NA, keep the one with BAD. So the outcome would be:

  ID      FLD
1  A     GOOD
2  B TERRIBLE
3  C      BAD
4  D      BAD
5  E TERRIBLE
6  F       NA

And I am lost.

CodePudding user response:

If we turn FLD into an ordered factor we can then use aggregate to get the minimum value per ID as follows:

df$FLD <- factor(df$FLD, levels=c("TERRIBLE", "BAD", "GOOD", "NA"), ordered = TRUE)
aggregate(data=df, FLD ~ ID, min)

  ID      FLD
1  A     GOOD
2  B TERRIBLE
3  C      BAD
4  D      BAD
5  E TERRIBLE
6  F       NA

CodePudding user response:

dplyr solution could be

df %>%
  mutate(FLD = factor(FLD, levels = c("TERRIBLE", "BAD", "GOOD", "NA"))) %>%
  group_by(ID) %>%
  filter(FLD == first(levels(droplevels(FLD))))

  ID    FLD     
  <chr> <fct>   
1 A     GOOD    
2 B     TERRIBLE
3 C     BAD     
4 D     BAD     
5 E     TERRIBLE
6 F     NA 
  •  Tags:  
  • r
  • Related