Home > Software engineering >  Filter data.table if and only if condition is verified despite NA
Filter data.table if and only if condition is verified despite NA

Time:10-06

I want to filter my data using conditions, but the presence of NA affects the results.

For example:

dt <- data.table(a=c(1:4,NA), b=c(NA,2,1,4,5), d=c(1,2,NA,4,NA))
dt

    a  b  d
1:  1 NA  1
2:  2  2  2
3:  3  1 NA
4:  4  4  4
5: NA  5 NA

when I do

subset(dt, !(b < a))
   a b d
1: 2 2 2
2: 4 4 4

i.e., if either a or b is NA, that row is excluded:

but the result I want is

    a  b  d
1:  1 NA  1
2:  2  2  2
3:  4  4  4
4: NA  5 NA

that is, I just want one row to be excluded if and only if the condition verifies.

If I add more conditions, like subset(dt, is.na(a) | is.na(b) | !(b < a)) it works as expected but I was looking for a way to express 'if and only if' through operators like & and |

Is this possible?

Thank you!

CodePudding user response:

This works:

dt[!which(dt$b < dt$a), ]

    a  b  d
1:  1 NA  1
2:  2  2  2
3:  4  4  4
4: NA  5 NA

In this workaround I am only selecting the rows which don't return TRUE for condition df$b < df$a. Meaning they can return FALSE or NA or whatever really.

CodePudding user response:

We may use if_any

library(dplyr)
dt %>%
    filter(if_any(c(b, a), is.na)|b >=a)
    a  b  d
1:  1 NA  1
2:  2  2  2
3:  4  4  4
4: NA  5 NA
  • Related