Problem
I have a dataset with a lot of identical kind of variables which can contain the same values. I would like to check if among those variable, we can find at list one value among a list of values.
Example
Let say that we have a dataset with 3 variables of factor type DAS1
, DAS2
, DAS3
.
The possible values for those variables are c("0", "1", "x", "y")
(note that I'm not trying to differentiate numbers and letters. Consider every values as characters).
library(data.table)
start <- data.table::data.table(DAS1 = c("0","1","x","0","1","0","1"),
DAS2 = c("x","y","0","0","x","1","0"),
DAS3 = c("1","1","y","1","x","y","0"))
My objective is to find which row contain at least one observation of the values "x"
or "y"
.
result <- data.table::data.table(DAS1 = c("0","1","x","0","1","0","1"),
DAS2 = c("x","y","0","0","x","1","0"),
DAS3 = c("1","1","y","1","x","y","0"),
xy = c(T,T,T,F,T,T,F))
Condition
I really want to do it with the data.table
package and not dplyr
because I mostly use data.table
and I don't like to switch between the two packages if it's not necessary.
Answer by @lovalery
start[, xy := apply(start[,c("DAS1", "DAS2", "DAS3")],1, function(x) any(x %in% c("x", "y")))][]
CodePudding user response:
You could do:
Reprex
- Code
library(data.table)
start[, xy := apply(start,1, function(x) any(x == "x" | x == "y"))][]
- Output
#> DAS1 DAS2 DAS3 xy
#> 1: 0 x 1 TRUE
#> 2: 1 y 1 TRUE
#> 3: x 0 y TRUE
#> 4: 0 0 1 FALSE
#> 5: 1 x x TRUE
#> 6: 0 1 y TRUE
#> 7: 1 0 0 FALSE
Created on 2022-03-04 by the reprex package (v2.0.1)
CodePudding user response:
You can try this
> start[, xy := rowSums((.SD == "x") (.SD == "y")) > 0][]
DAS1 DAS2 DAS3 xy
1: 0 x 1 TRUE
2: 1 y 1 TRUE
3: x 0 y TRUE
4: 0 0 1 FALSE
5: 1 x x TRUE
6: 0 1 y TRUE
7: 1 0 0 FALSE
or
> start[, xy := rowSums(Reduce(` `, Map(`==`, c("x", "y"), list(.SD)))) > 0][]
DAS1 DAS2 DAS3 xy
1: 0 x 1 TRUE
2: 1 y 1 TRUE
3: x 0 y TRUE
4: 0 0 1 FALSE
5: 1 x x TRUE
6: 0 1 y TRUE
7: 1 0 0 FALSE