I have a df that operates like this:
ID <- c("A", "B", "B", "C", "D", "D", "E", "F")
FLD<-c("GOOD","BAD","TERRIBLE", "BAD", "BAD","NA", "TERRIBLE", "NA")
df<-data.frame(ID, FLD)
I need to get it down so there's only rows with unique ID's. If ID's are duplicated, and the FLD value for one duplicate is BAD and one value is TERRIBLE, I want the one that says TERRIBLE to remain and the one that says BAD to drop. If, however, in the case of ID "D", the FLD values are BAD and NA, I want to keep BAD.
I'm not sure how to deal with it, because the actual dataset has tens of thousands of entries, so I can't just fix it by looking. Basically the dummy script I want is along the lines of:
If duplicated, and one duplicate is BAD is TERRIBLE, drop the row with BAD. But if one duplicate is BAD and one is NA, keep the one with BAD. So the outcome would be:
ID FLD
1 A GOOD
2 B TERRIBLE
3 C BAD
4 D BAD
5 E TERRIBLE
6 F NA
And I am lost.
CodePudding user response:
If we turn FLD
into an ordered factor we can then use aggregate
to get the minimum value per ID as follows:
df$FLD <- factor(df$FLD, levels=c("TERRIBLE", "BAD", "GOOD", "NA"), ordered = TRUE)
aggregate(data=df, FLD ~ ID, min)
ID FLD
1 A GOOD
2 B TERRIBLE
3 C BAD
4 D BAD
5 E TERRIBLE
6 F NA
CodePudding user response:
dplyr
solution could be
df %>%
mutate(FLD = factor(FLD, levels = c("TERRIBLE", "BAD", "GOOD", "NA"))) %>%
group_by(ID) %>%
filter(FLD == first(levels(droplevels(FLD))))
ID FLD
<chr> <fct>
1 A GOOD
2 B TERRIBLE
3 C BAD
4 D BAD
5 E TERRIBLE
6 F NA