Home > database >  Check if a value among a list exist in multiple column R data.table
Check if a value among a list exist in multiple column R data.table

Time:03-04

Problem

I have a dataset with a lot of identical kind of variables which can contain the same values. I would like to check if among those variable, we can find at list one value among a list of values.

Example

Let say that we have a dataset with 3 variables of factor type DAS1, DAS2, DAS3. The possible values for those variables are c("0", "1", "x", "y") (note that I'm not trying to differentiate numbers and letters. Consider every values as characters).

library(data.table)

start <- data.table::data.table(DAS1 = c("0","1","x","0","1","0","1"),
                       DAS2 = c("x","y","0","0","x","1","0"),
                       DAS3 = c("1","1","y","1","x","y","0"))

My objective is to find which row contain at least one observation of the values "x" or "y".

result <- data.table::data.table(DAS1 = c("0","1","x","0","1","0","1"),
                       DAS2 = c("x","y","0","0","x","1","0"),
                       DAS3 = c("1","1","y","1","x","y","0"),
                       xy = c(T,T,T,F,T,T,F))

Condition

I really want to do it with the data.table package and not dplyr because I mostly use data.table and I don't like to switch between the two packages if it's not necessary.

Answer by @lovalery

start[, xy := apply(start[,c("DAS1", "DAS2", "DAS3")],1, function(x) any(x %in% c("x", "y")))][]

CodePudding user response:

You could do:

Reprex

  • Code
library(data.table)

start[, xy := apply(start,1, function(x) any(x == "x" | x == "y"))][]
  • Output
#>    DAS1 DAS2 DAS3    xy
#> 1:    0    x    1  TRUE
#> 2:    1    y    1  TRUE
#> 3:    x    0    y  TRUE
#> 4:    0    0    1 FALSE
#> 5:    1    x    x  TRUE
#> 6:    0    1    y  TRUE
#> 7:    1    0    0 FALSE

Created on 2022-03-04 by the reprex package (v2.0.1)

CodePudding user response:

You can try this

> start[, xy := rowSums((.SD == "x")   (.SD == "y")) > 0][]
   DAS1 DAS2 DAS3    xy
1:    0    x    1  TRUE
2:    1    y    1  TRUE
3:    x    0    y  TRUE
4:    0    0    1 FALSE
5:    1    x    x  TRUE
6:    0    1    y  TRUE
7:    1    0    0 FALSE

or

> start[, xy := rowSums(Reduce(` `, Map(`==`, c("x", "y"), list(.SD)))) > 0][]
   DAS1 DAS2 DAS3    xy
1:    0    x    1  TRUE
2:    1    y    1  TRUE
3:    x    0    y  TRUE
4:    0    0    1 FALSE
5:    1    x    x  TRUE
6:    0    1    y  TRUE
7:    1    0    0 FALSE
  • Related