I have a dataset like the following:
ID <- c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5)
LABEL <- c('TT', 'CH', 'AB', 'TT', 'CH', 'AB', 'TT', 'CH', 'TT', 'CH', 'AB')
VALUE <- c(2121, 32, 1, 2121, 32, 2, 2121, 99, 2222, 32, 9)
DATA <- data.frame(ID, LABEL, VALUE)
I'm trying to group the data by ID (it's a long file where ID replicates) then filter where the same ID meets multiple conditions. I've tried the following:
X <- DATA %>%
group_by(ID) %>%
filter(
(LABEL == "TT" & VALUE == "2121") &
(LABEL == "CH" & VALUE == "32") )
I want to get a dataset with only the observations with ID 1 and 2 which meet both conditions simultaneously.
Any help is appreciated.
CodePudding user response:
We can use any
- after grouping by 'ID', if we want to check for presence of the strings in two columns, wrap with any
as multiple expression cannot be TRUE for the same location. Then, we use &
so that if only both are present it returns TRUE
(any
- returns a single TRUE/FALSE)
library(dplyr)
DATA %>%
group_by(ID) %>%
filter(
any((LABEL == "TT" & VALUE == "2121") &
any((LABEL == "CH" & VALUE == "32") ))) %>%
ungroup
-output
# A tibble: 6 × 3
ID LABEL VALUE
<dbl> <chr> <dbl>
1 1 TT 2121
2 1 CH 32
3 1 AB 1
4 2 TT 2121
5 2 CH 32
6 2 AB 2
Or slighly more compact option by paste
ing the columns and then check if all
the elements in the lhs of %in%
are TRUE for filter
ing the groups
library(stringr)
DATA %>%
group_by(ID) %>%
filter(all(c("TT2121", "CH32") %in% str_c(LABEL, VALUE))) %>%
ungroup
-output
# A tibble: 6 × 3
ID LABEL VALUE
<dbl> <chr> <dbl>
1 1 TT 2121
2 1 CH 32
3 1 AB 1
4 2 TT 2121
5 2 CH 32
6 2 AB 2