i have a data table as this example below:
col1 | col2 |
---|---|
a,b,c,d | a,c,d |
r,h,g | r |
so each column of this table contain a list. I wanted to create 2 other list in 2 different columns, each one represent the intersect or union of col1 and 2:
the output that I want is this one :
col1 | col2 | inter | union |
---|---|---|---|
a,b,c | a,c,d,k | a,c | a,b,c,d,k |
r,h,g | r | r | r,h,g |
I tried this command but it gives me an error:
data$inter = intersect(data$col1, data$col2)
the error is :
Error in set(x, j = name, value = value) :
Supplied 4 items to be assigned to 748 items of column 'intersect'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code
FYI: its not the real data, its just a simplify example, the error that I am showing is for the real data
Thank you in advance
CodePudding user response:
I used this command, and it seems working :
mapply(function(x, y) paste0(intersect(x, y), collapse = " "), strsplit(data$col1, '\\s'), strsplit(data$col2, '\\s'))
CodePudding user response:
There are several problems here:
- You should convert your elements into vectors. As is, the functions
intersect
andunion
won't work on single strings. - You should work with columns as lists and use a
rowwise
computation to achieve your results.
library(tidyverse)
apply(data, c(1, 2), \(x) strsplit(x, ",")[[1]]) %>%
as_tibble() %>%
rowwise() %>%
mutate(inter = list(intersect(col1, col2)),
union = list(union(col1, col2)))
# A tibble: 2 × 4
# Rowwise:
col1 col2 inter union
<list> <list> <list> <list>
1 <chr [4]> <chr [3]> <chr [3]> <chr [4]>
2 <chr [3]> <chr [1]> <chr [1]> <chr [3]>
You can get back to your original string-like dataframe by using paste
over all columns:
... %>%
mutate(across(everything(), paste, collapse = ","))
# A tibble: 2 × 4
# Rowwise:
col1 col2 inter union
<chr> <chr> <chr> <chr>
1 a,b,c,d a,c,d a,c,d a,b,c,d
2 r,h,g r r r,h,g