Home > Software engineering >  R: Filter one column based on another with many to many mapping
R: Filter one column based on another with many to many mapping

Time:04-13

I have a dataset with an ID column and an item column. An ID is mapped to one or more items. The dataset has a row for each item mapped to an ID. I want to return IDs that contain my_items. The order of the items does not matter. I have a toy example below.

ID <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5)
item <- c("a", "b", "c", "a", "b", "c", "d", "a", "b", "d", "b", "a", "c")
df <- data.frame(cbind(ID, item))
df

enter image description here

my_items <- c("a", "b", "c")

My expected output would only include item ID 1 and 5.

CodePudding user response:

df %>% 
  group_by(ID) %>%
  filter(setequal(item,my_items))

Output

  ID    item 
  <chr> <chr>
1 1     a    
2 1     b    
3 1     c    
4 5     b    
5 5     a    
6 5     c  

CodePudding user response:

We can use all after creating a logical vector with %in% and grouping by 'ID' and also create a condition with n_distinct

library(dplyr)
df %>% 
   group_by(ID) %>% 
   filter(all(my_items %in% item), n_distinct(item) == 3) %>%
   ungroup

-output

# A tibble: 6 × 2
     ID item 
  <dbl> <chr>
1     1 a    
2     1 b    
3     1 c    
4     5 b    
5     5 a    
6     5 c   

CodePudding user response:

If we add arrange, we could also use identical in this case:

library(dplyr)

  df %>% 
    group_by(ID) %>%
    arrange(item, .by_group = TRUE) %>% 
    filter(identical(item,my_items))
  ID    item 
  <chr> <chr>
1 1     a    
2 1     b    
3 1     c    
4 5     a    
5 5     b    
6 5     c 
  • Related