I have a data that looks as follow:
toy.dat <- data.frame(group = c(rep("A_0", 3), rep("A_1", 2),
rep("B_0", 3) , rep("B_1", 3)))
toy.dat$letters <- c("A", 'B', "C", "A", "D", "C", "E", "F", "A", "B", "F")
toy.dat %>%
group_by(group) %>%
summarise(letters = list(letters), num = n()) %>%
mutate(group_number = gsub(".*_", "", group))
group letters num_elements group_num
A_0 c("A", "B", "C") 3 0
A_1 c("A", "D") 2 1
B_0 c("C", "E", "F") 3 0
B_1 c("A", "B", "F") 3 1
I would like to group by group_numb and find the intersection of letters of those rows and add them to the data frame.
the output should give "c" for A_0 and B_0 and "A" for A_1 and B_1.
CodePudding user response:
We may use reduce
library(dplyr)
library(purrr)
toy.dat %>% group_by(group) %>% summarise(letters = list(letters), num = n()) %>%
mutate(group_number = gsub(".*_", "", group)) %>% group_by(group_number) %>% mutate(intersect = list(reduce(letters, intersect))) %>%
ungroup %>%
mutate(nintersect = lengths(intersect))
-output
# A tibble: 4 × 6
group letters num group_number intersect nintersect
<chr> <list> <int> <chr> <list> <int>
1 A_0 <chr [3]> 3 0 <chr [1]> 1
2 A_1 <chr [2]> 2 1 <chr [1]> 1
3 B_0 <chr [3]> 3 0 <chr [1]> 1
4 B_1 <chr [3]> 3 1 <chr [1]> 1