Home > Software engineering >  Intersect 2 columns in data.table R
Intersect 2 columns in data.table R

Time:05-18

i have a data table as this example below:

col1 col2
a,b,c,d a,c,d
r,h,g r

so each column of this table contain a list. I wanted to create 2 other list in 2 different columns, each one represent the intersect or union of col1 and 2:

the output that I want is this one :

col1 col2 inter union
a,b,c a,c,d,k a,c a,b,c,d,k
r,h,g r r r,h,g

I tried this command but it gives me an error:

data$inter = intersect(data$col1, data$col2)

the error is :

Error in set(x, j = name, value = value) : 
  Supplied 4 items to be assigned to 748 items of column 'intersect'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code

FYI: its not the real data, its just a simplify example, the error that I am showing is for the real data

Thank you in advance

CodePudding user response:

I used this command, and it seems working :

mapply(function(x, y) paste0(intersect(x, y), collapse = " "), strsplit(data$col1, '\\s'), strsplit(data$col2, '\\s'))

CodePudding user response:

There are several problems here:

  1. You should convert your elements into vectors. As is, the functions intersect and union won't work on single strings.
  2. You should work with columns as lists and use a rowwise computation to achieve your results.
library(tidyverse)
apply(data, c(1, 2), \(x) strsplit(x, ",")[[1]]) %>% 
  as_tibble() %>% 
  rowwise() %>% 
  mutate(inter = list(intersect(col1, col2)),
         union = list(union(col1, col2)))

# A tibble: 2 × 4
# Rowwise: 
  col1      col2      inter     union    
  <list>    <list>    <list>    <list>   
1 <chr [4]> <chr [3]> <chr [3]> <chr [4]>
2 <chr [3]> <chr [1]> <chr [1]> <chr [3]>

You can get back to your original string-like dataframe by using paste over all columns:

... %>%
  mutate(across(everything(), paste, collapse = ","))

# A tibble: 2 × 4
# Rowwise: 
  col1    col2  inter union  
  <chr>   <chr> <chr> <chr>  
1 a,b,c,d a,c,d a,c,d a,b,c,d
2 r,h,g   r     r     r,h,g  
  •  Tags:  
  • r
  • Related