Subsetting on a column with NA returns a whole row of NA. I know there are multiple ways to avoid this; my question is why does this happen at all? For example:
> d<-data.frame(a = 1:3, b = c(NA, 2, 5))
> d[d$b == 2,]
a b
NA NA NA
2 2 2
I would understand if it simply returned row 1 also, but it returns a whole row of NA which never existed in the object I subsetted. This seems strange and unhelpful, and I can't find an explanation of why this behavior exists (again, I know how to prevent it).
CodePudding user response:
It is unintuitive indeed, but if you check d$b == 2
you see that:
> d$b == 2
#[1] NA TRUE FALSE
And when you subset a row with NA, it adds a NA row:
> d[c(NA, 2), ]
# a b
#NA NA NA
#2 2 2
d[d$b == 2, ]
cannot return the first row, since the first value of d$b == 2
should be 1, and here it is NA
.
CodePudding user response:
Apart from Maël's right answer, to return only the TRUE
values, use which
. It returns the indices for which the condition is TRUE
.
d <- data.frame(a = 1:3, b = c(NA, 2, 5))
d[d$b == 2,]
#> a b
#> NA NA NA
#> 2 2 2
d[which(d$b == 2), ]
#> a b
#> 2 2 2
Created on 2022-10-28 with reprex v2.0.2