Home > Back-end >  R group_by and filter with multiple conditions
R group_by and filter with multiple conditions

Time:10-25

I have a dataset like the following:

ID <- c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5)
LABEL <- c('TT', 'CH', 'AB', 'TT', 'CH', 'AB', 'TT', 'CH', 'TT', 'CH', 'AB')
VALUE <- c(2121, 32, 1, 2121, 32, 2, 2121, 99, 2222, 32, 9)

DATA <- data.frame(ID, LABEL, VALUE)

I'm trying to group the data by ID (it's a long file where ID replicates) then filter where the same ID meets multiple conditions. I've tried the following:

X <- DATA %>% 
  group_by(ID) %>% 
  filter( 
    (LABEL == "TT" & VALUE == "2121") & 
      (LABEL == "CH" & VALUE == "32") )

I want to get a dataset with only the observations with ID 1 and 2 which meet both conditions simultaneously.

Any help is appreciated.

CodePudding user response:

We can use any - after grouping by 'ID', if we want to check for presence of the strings in two columns, wrap with any as multiple expression cannot be TRUE for the same location. Then, we use & so that if only both are present it returns TRUE (any - returns a single TRUE/FALSE)

library(dplyr)
DATA %>% 
   group_by(ID) %>% 
   filter( 
    any((LABEL == "TT" & VALUE == "2121") &
      any((LABEL == "CH" & VALUE == "32") ))) %>%
   ungroup

-output

# A tibble: 6 × 3
     ID LABEL VALUE
  <dbl> <chr> <dbl>
1     1 TT     2121
2     1 CH       32
3     1 AB        1
4     2 TT     2121
5     2 CH       32
6     2 AB        2

Or slighly more compact option by pasteing the columns and then check if all the elements in the lhs of %in% are TRUE for filtering the groups

library(stringr)
DATA %>%
   group_by(ID) %>%
   filter(all(c("TT2121", "CH32") %in% str_c(LABEL, VALUE))) %>% 
   ungroup

-output

# A tibble: 6 × 3
     ID LABEL VALUE
  <dbl> <chr> <dbl>
1     1 TT     2121
2     1 CH       32
3     1 AB        1
4     2 TT     2121
5     2 CH       32
6     2 AB        2
  • Related