Home > Blockchain >  Subset rows on one datatable based on multiple column values of another datatable
Subset rows on one datatable based on multiple column values of another datatable

Time:08-10

I have two datatables. The original one looks like this:

library(data.table)
DT <- data.table(
  x = rep(c("b", "a", "c"), each = 3),
  v = rep(c("d", "e", "f"), times = 3),
  foo = 10:18
)

DT
#>    x v foo
#> 1: b d  10
#> 2: b e  11
#> 3: b f  12
#> 4: a d  13
#> 5: a e  14
#> 6: a f  15
#> 7: c d  16
#> 8: c e  17
#> 9: c f  18

The second one looks like this:

X <- data.table(x = c("c", "b"), v = c("d", "f"))

X
#>    x v
#> 1: c d
#> 2: b f

I want to subset the rows in DT whose values are in the column values of X.

This is what I want to do:

DT[x %chin% X$x & v %chin% X$v]
#>    x v foo
#> 1: b d  10
#> 2: b f  12
#> 3: c d  16
#> 4: c f  18

But I have 50 such columns.

Is there a succinct data.table way to achieve that?

I've tried using the on argument but doesn't return desired output:

DT[X, on = colnames(X)]
#>    x v foo
#> 1: c d  16
#> 2: b f  12

CodePudding user response:

You could generate the condition as an expression:

cols <- colnames(X)
condition = parse(text=paste0(cols,' %chin% X$', cols, collapse = ' & ')) 

condition
#> expression(x %chin% X$x & v %chin% X$v)

DT[eval(condition)]
#>         x      v   foo
#>    <char> <char> <int>
#> 1:      b      d    10
#> 2:      b      f    12
#> 3:      c      d    16
#> 4:      c      f    18

CodePudding user response:

Another way to solve your problem:

cols = colnames(X)

DT[do.call(pmin, Map(`%chin%`, DT[, ..cols], X[, ..cols]))==1]

# or

DT[rowSums(mapply(`%chin%`, DT[,..cols], X[,..cols]))==length(cols)]

        x      v   foo
   <char> <char> <int>
1:      b      d    10
2:      b      f    12
3:      c      d    16
4:      c      f    18
  • Related