I have a dataset with an ID column and an item column. An ID is mapped to one or more items. The dataset has a row for each item mapped to an ID. I want to return IDs that contain my_items
. The order of the items does not matter. I have a toy example below.
ID <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5)
item <- c("a", "b", "c", "a", "b", "c", "d", "a", "b", "d", "b", "a", "c")
df <- data.frame(cbind(ID, item))
df
my_items <- c("a", "b", "c")
My expected output would only include item ID 1 and 5.
CodePudding user response:
df %>%
group_by(ID) %>%
filter(setequal(item,my_items))
Output
ID item
<chr> <chr>
1 1 a
2 1 b
3 1 c
4 5 b
5 5 a
6 5 c
CodePudding user response:
We can use all
after creating a logical vector with %in%
and grouping by 'ID' and also create a condition with n_distinct
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(my_items %in% item), n_distinct(item) == 3) %>%
ungroup
-output
# A tibble: 6 × 2
ID item
<dbl> <chr>
1 1 a
2 1 b
3 1 c
4 5 b
5 5 a
6 5 c
CodePudding user response:
If we add arrange
, we could also use identical
in this case:
library(dplyr)
df %>%
group_by(ID) %>%
arrange(item, .by_group = TRUE) %>%
filter(identical(item,my_items))
ID item
<chr> <chr>
1 1 a
2 1 b
3 1 c
4 5 a
5 5 b
6 5 c