Home > Back-end >  R / tidyverse - find intersect between multiple character columns
R / tidyverse - find intersect between multiple character columns

Time:09-21

I have the following problem, I have a tibble with mutliple character columns. I tried to provide an MRE below:

library(tidyverse)
df <- tibble(food = c("pizza, bread, apple","joghurt, cereal, banana"), 
             food2 = c("bread, sausage, strawberry", "joghurt, oat, bacon"),
             food3 = c("ice cream, bread, milkshake", "melon, cake, joghurt")
             )
df %>%
  # rowwise() %>%
  mutate(allcolumns = map2(
    str_split(food, ", "),
    str_split(food2, ", "),
    # str_split(food3, ", "),
    intersect
  ) %>% unlist()
  ) -> df_new

My goal would be to get the common words for all columns. Words are separated by , in the columns. In the MRE I am able to find the intersect between two columns, however I couldnt get a solution for this issue. I experimented with Reduce but was not able to get it.

As an EDIT: I would also like to append it as a new row to the existing tibble

CodePudding user response:

We may use map to loop over the columns, do the str_split and then reduce to get the intersect for elementwise intersect

library(dplyr)
library(purrr)
library(stringr)
df %>% 
   purrr::map(str_split, ", ") %>%
   transpose %>%
   purrr::map_chr(reduce, intersect) %>%
   mutate(df, Intersect = .)

-output

# A tibble: 2 x 4
  food                    food2                      food3                       Intersect
  <chr>                   <chr>                      <chr>                       <chr>    
1 pizza, bread, apple     bread, sausage, strawberry ice cream, bread, milkshake bread    
2 joghurt, cereal, banana joghurt, oat, bacon        melon, cake, joghurt        joghurt  

or may also use pmap

df %>%
    mutate(Intersect = pmap(across(everything(), str_split, ", "), 
        ~ list(...) %>%
              reduce(intersect)))
  • Related