I want to filter my data using conditions, but the presence of NA affects the results.
For example:
dt <- data.table(a=c(1:4,NA), b=c(NA,2,1,4,5), d=c(1,2,NA,4,NA))
dt
a b d
1: 1 NA 1
2: 2 2 2
3: 3 1 NA
4: 4 4 4
5: NA 5 NA
when I do
subset(dt, !(b < a))
a b d
1: 2 2 2
2: 4 4 4
i.e., if either a or b is NA, that row is excluded:
but the result I want is
a b d
1: 1 NA 1
2: 2 2 2
3: 4 4 4
4: NA 5 NA
that is, I just want one row to be excluded if and only if the condition verifies.
If I add more conditions, like subset(dt, is.na(a) | is.na(b) | !(b < a))
it works as expected but I was looking for a way to express 'if and only if' through operators like &
and |
Is this possible?
Thank you!
CodePudding user response:
This works:
dt[!which(dt$b < dt$a), ]
a b d
1: 1 NA 1
2: 2 2 2
3: 4 4 4
4: NA 5 NA
In this workaround I am only selecting the rows which don't return TRUE
for condition df$b < df$a
. Meaning they can return FALSE
or NA
or whatever really.
CodePudding user response:
We may use if_any
library(dplyr)
dt %>%
filter(if_any(c(b, a), is.na)|b >=a)
a b d
1: 1 NA 1
2: 2 2 2
3: 4 4 4
4: NA 5 NA