Home > Software design >  Find common values of two lists in R data table
Find common values of two lists in R data table

Time:06-16

I have a data table test:

id array1 array2
1 c(1, 2, 3, 4, 5) c(3, 4, 5)
2 c(6, 7, 8, 9, 10) c(6, 7, 0)
> str(test)
Classes ‘data.table’ and 'data.frame':  2 obs. of  3 variables:
 $ id    : num  1 2
 $ array1:List of 2
  ..$ : num  1 2 3 4 5
  ..$ : num  6 7 8 9 10
 $ array2:List of 2
  ..$ : num  3 4 5
  ..$ : num  6 7 0
 - attr(*, ".internal.selfref")=<externalptr> 

For each row I want to find values which are common for array1 and array2 as well as to find values that are in array1 but not in array2.

I tried using setdiff() and intersect() but got incorrect results:

test[, `:=` (diff = setdiff(array1, array2),
             common = intersect(array1, array2))]
id array1 array2 diff common
1 c(1, 2, 3, 4, 5) c(3, 4, 5) c(1, 2, 3, 4, 5) NULL
2 c(6, 7, 8, 9, 10) c(6, 7, 0) c(6, 7, 8, 9, 10) NULL

Expected output:

id array1 array2 diff common
1 c(1, 2, 3, 4, 5) c(3, 4, 5) c(1, 2) c(3, 4, 5)
2 c(6, 7, 8, 9, 10) c(6, 7, 0) c(8, 9, 10) c(6, 7)

Will be grateful for any help!

CodePudding user response:

As these are list, use Map to loop over the list elements and apply the functions

library(data.table)
test[, c("diff", "common") := list(Map(setdiff, array1, array2), 
      Map(intersect, array1, array2))]

-output

> test
      id         array1 array2     diff common
   <int>         <list> <list>   <list> <list>
1:     1      1,2,3,4,5  3,4,5      1,2  3,4,5
2:     2  6, 7, 8, 9,10  6,7,0  8, 9,10    6,7

Or using a single Map with transpose

test[, c("diff", "common") := transpose(Map(function(x, y) 
      list(setdiff(x, y), intersect(x, y)), array1, array2))]

Or group by 'id' (assuming no duplicate 'id's) and extract ([[1]]) the first element

test[, c('diff', 'common') := .(.(setdiff(array1[[1]], 
        array2[[1]])), .(intersect(array1[[1]], array2[[1]]))), id]

data

test <- structure(list(id = 1:2, array1 = list(1:5, 6:10), 
   array2 = list(
    3:5, c(6L, 7L, 0L))), row.names = c(NA, -2L), class = c("data.table", 
"data.frame"))
  • Related