Home > Software engineering >  intersecting elements after grouping by a variable
intersecting elements after grouping by a variable

Time:10-19

I have a data that looks as follow:

toy.dat <- data.frame(group = c(rep("A_0", 3), rep("A_1", 2), 
                                rep("B_0", 3) , rep("B_1", 3)))
toy.dat$letters <- c("A", 'B', "C", "A", "D", "C", "E", "F", "A", "B", "F")

toy.dat %>% 
  group_by(group) %>% 
  summarise(letters = list(letters), num = n()) %>%
  mutate(group_number = gsub(".*_", "", group))


group   letters            num_elements  group_num   
A_0     c("A", "B", "C")       3              0        
A_1     c("A", "D")            2              1
B_0     c("C", "E", "F")       3              0
B_1     c("A", "B", "F")       3              1

I would like to group by group_numb and find the intersection of letters of those rows and add them to the data frame.

the output should give "c" for A_0 and B_0 and "A" for A_1 and B_1.

CodePudding user response:

We may use reduce

library(dplyr)
library(purrr)
toy.dat %>% group_by(group) %>% summarise(letters = list(letters), num = n()) %>%
mutate(group_number = gsub(".*_", "", group)) %>% group_by(group_number) %>% mutate(intersect = list(reduce(letters, intersect))) %>%
 ungroup %>%
   mutate(nintersect = lengths(intersect))

-output

# A tibble: 4 × 6
  group letters     num group_number intersect nintersect
  <chr> <list>    <int> <chr>        <list>         <int>
1 A_0   <chr [3]>     3 0            <chr [1]>          1
2 A_1   <chr [2]>     2 1            <chr [1]>          1
3 B_0   <chr [3]>     3 0            <chr [1]>          1
4 B_1   <chr [3]>     3 1            <chr [1]>          1
  • Related