In R, why does subsetting on a column with NA return a whole row of NA?-CodePudding

Subsetting on a column with NA returns a whole row of NA. I know there are multiple ways to avoid this; my question is why does this happen at all? For example:

> d<-data.frame(a = 1:3, b = c(NA, 2, 5))
> d[d$b == 2,]
    a  b
NA NA NA
2   2  2

I would understand if it simply returned row 1 also, but it returns a whole row of NA which never existed in the object I subsetted. This seems strange and unhelpful, and I can't find an explanation of why this behavior exists (again, I know how to prevent it).

CodePudding user response：

It is unintuitive indeed, but if you check d$b == 2 you see that:

> d$b == 2
#[1]    NA  TRUE FALSE

And when you subset a row with NA, it adds a NA row:

> d[c(NA, 2), ]
#    a  b
#NA NA NA
#2   2  2

d[d$b == 2, ] cannot return the first row, since the first value of d$b == 2 should be 1, and here it is NA.

CodePudding user response：

Apart from Maël's right answer, to return only the TRUE values, use which. It returns the indices for which the condition is TRUE.

d <- data.frame(a = 1:3, b = c(NA, 2, 5))
d[d$b == 2,]
#>     a  b
#> NA NA NA
#> 2   2  2

d[which(d$b == 2), ]
#>   a b
#> 2 2 2

^{Created on 2022-10-28 with reprex v2.0.2}