Home > Mobile >  Filtering rule application when multiple categories exist in R
Filtering rule application when multiple categories exist in R

Time:09-28

I have a filtering question as below:

df <- data.frame(id = c(1,1,1, 2,2, 3,3, 4,4, 5, 6),
                 cat =c("A","B","C", "A","C", "B","C", "A","B", "A", "B"),
                 value = c("Y","N","Y", "Y","N", "N","Y", "N","Y", "N","Y"))

> df
   id cat value
1   1   A     Y
2   1   B     N
3   1   C     Y
4   2   A     Y
5   2   C     N
6   3   B     N
7   3   C     Y
8   4   A     N
9   4   B     Y
10  5   A     N
11  6   B     Y

Filtering rules:

1) When an id has three cat, I need to keep only cat B.

2) When an id has two cat - A and B, I need to keep only cat B.

3) When an id has two cat - A and C, I need to keep only cat C.

4) When an id has two cat - B and C, I need to keep only cat B.

5) When an id has one cat - keep the category.

In sum, desired cat order is first B, then C, and last A if there is only one category.

How can I achieve this desired dataframe:

> df1
  id cat value
1  1   B     N
2  2   C     N
3  3   B     N
4  4   B     Y
5  5   A     N
6  6   B     Y

It tried this below but it only considers cat B filtering. I also need to add C filtering when the cats are A and C.

df %>% group_by(id) %>% mutate(value = value[match("B", cat, nomatch = 1)]) %>% ungroup

CodePudding user response:

In base R, using just the priority of categories:

tmp <- lapply(split(df, df$id), 
              function(x) x[which.min(match(x$cat, c('B', 'C', 'A'))), ])
do.call(rbind, tmp)

dplyr way:

df %>%
  group_by(id) %>%
  slice(which.min(match(cat, c('B', 'C', 'A'))))

CodePudding user response:

Here is an option using some suitable case_when conditions:

df %>%
    group_by(id) %>%
    filter(case_when(
        length(cat) == 3 & all(cat %in% c("A", "B", "C")) ~ cat == "B",
        length(cat) == 2 & all(cat %in% c("A", "B")) ~ cat == "B",
        length(cat) == 2 & all(cat %in% c("A", "C")) ~ cat == "C",
        length(cat) == 2 & all(cat %in% c("B", "C")) ~ cat == "B",
        length(cat) == 1 ~ cat == unique(cat))) %>%
    ungroup()
## A tibble: 6 × 3
#     id cat   value
#  <dbl> <chr> <chr>
#1     1 B     N    
#2     2 C     N    
#3     3 B     N    
#4     4 B     Y    
#5     5 A     N    
#6     6 B     Y    

CodePudding user response:

An option with ordered

library(dplyr)
df %>%
   arrange(id, ordered(cat, c("B", "C", "A"))) %>%
   group_by(id) %>%
   slice_head(n = 1) %>% 
   ungroup

-output

# A tibble: 6 × 3
     id cat   value
  <dbl> <chr> <chr>
1     1 B     N    
2     2 C     N    
3     3 B     N    
4     4 B     Y    
5     5 A     N    
6     6 B     Y   
  • Related