I have a filtering question as below:
df <- data.frame(id = c(1,1,1, 2,2, 3,3, 4,4, 5, 6),
cat =c("A","B","C", "A","C", "B","C", "A","B", "A", "B"),
value = c("Y","N","Y", "Y","N", "N","Y", "N","Y", "N","Y"))
> df
id cat value
1 1 A Y
2 1 B N
3 1 C Y
4 2 A Y
5 2 C N
6 3 B N
7 3 C Y
8 4 A N
9 4 B Y
10 5 A N
11 6 B Y
Filtering rules:
1)
When an id
has three cat
, I need to keep only cat
B
.
2)
When an id
has two cat
- A
and B
, I need to keep only cat
B
.
3)
When an id
has two cat
- A
and C
, I need to keep only cat
C
.
4)
When an id
has two cat
- B
and C
, I need to keep only cat
B
.
5)
When an id
has one cat
- keep the category.
In sum, desired cat
order is first B
, then C
, and last A
if there is only one category.
How can I achieve this desired dataframe:
> df1
id cat value
1 1 B N
2 2 C N
3 3 B N
4 4 B Y
5 5 A N
6 6 B Y
It tried this below but it only considers cat
B
filtering. I also need to add C
filtering when the cat
s are A
and C
.
df %>% group_by(id) %>% mutate(value = value[match("B", cat, nomatch = 1)]) %>% ungroup
CodePudding user response:
In base R, using just the priority of categories:
tmp <- lapply(split(df, df$id),
function(x) x[which.min(match(x$cat, c('B', 'C', 'A'))), ])
do.call(rbind, tmp)
dplyr
way:
df %>%
group_by(id) %>%
slice(which.min(match(cat, c('B', 'C', 'A'))))
CodePudding user response:
Here is an option using some suitable case_when
conditions:
df %>%
group_by(id) %>%
filter(case_when(
length(cat) == 3 & all(cat %in% c("A", "B", "C")) ~ cat == "B",
length(cat) == 2 & all(cat %in% c("A", "B")) ~ cat == "B",
length(cat) == 2 & all(cat %in% c("A", "C")) ~ cat == "C",
length(cat) == 2 & all(cat %in% c("B", "C")) ~ cat == "B",
length(cat) == 1 ~ cat == unique(cat))) %>%
ungroup()
## A tibble: 6 × 3
# id cat value
# <dbl> <chr> <chr>
#1 1 B N
#2 2 C N
#3 3 B N
#4 4 B Y
#5 5 A N
#6 6 B Y
CodePudding user response:
An option with ordered
library(dplyr)
df %>%
arrange(id, ordered(cat, c("B", "C", "A"))) %>%
group_by(id) %>%
slice_head(n = 1) %>%
ungroup
-output
# A tibble: 6 × 3
id cat value
<dbl> <chr> <chr>
1 1 B N
2 2 C N
3 3 B N
4 4 B Y
5 5 A N
6 6 B Y