I have a df that looks like the following:
ID CAT VAL
2 A I
2 A II
3 B I
3 B I
3 A Unknown
4 A III
What I would like to do is if the ID and the CAT match take only the row with the highest VAL using a custom order (Unknown, I,II,III,IV). So the output would look like:
ID CAT VAL
2 A II
3 B I
3 A Unknown
4 A III
CodePudding user response:
Create an ordered factor and then you can take max
for each group.
library(dplyr)
custom_order <- c('Unknown','I','II','III','IV')
df %>%
mutate(VAL = factor(VAL, custom_order, ordered = TRUE)) %>%
group_by(ID, CAT) %>%
summarise(VAL = max(VAL)) %>%
ungroup
# ID CAT VAL
# <int> <chr> <ord>
#1 2 A II
#2 3 A Unknown
#3 3 B I
#4 4 A III
Another option using arrange
and distinct
.
df %>%
arrange(ID, CAT, desc(match(VAL, custom_order))) %>%
distinct(ID, CAT, .keep_all = TRUE)
CodePudding user response:
We may use
library(dplyr)
df1 %>%
arrange(ID, CAT, match(VAL, c('Unknown','I','II','III','IV'))) %>%
group_by(ID, CAT) %>%
slice_tail(n = 1)
# A tibble: 4 × 3
# Groups: ID, CAT [4]
ID CAT VAL
<int> <chr> <chr>
1 2 A II
2 3 A Unknown
3 3 B I
4 4 A III
data
df1 <- structure(list(ID = c(2L, 2L, 3L, 3L, 3L, 4L), CAT = c("A", "A",
"B", "B", "A", "A"), VAL = c("I", "II", "I", "I", "Unknown",
"III")), class = "data.frame", row.names = c(NA, -6L))