Home > Mobile >  In R, why does subsetting on a column with NA return a whole row of NA?
In R, why does subsetting on a column with NA return a whole row of NA?

Time:10-29

Subsetting on a column with NA returns a whole row of NA. I know there are multiple ways to avoid this; my question is why does this happen at all? For example:

> d<-data.frame(a = 1:3, b = c(NA, 2, 5))
> d[d$b == 2,]
    a  b
NA NA NA
2   2  2

I would understand if it simply returned row 1 also, but it returns a whole row of NA which never existed in the object I subsetted. This seems strange and unhelpful, and I can't find an explanation of why this behavior exists (again, I know how to prevent it).

CodePudding user response:

It is unintuitive indeed, but if you check d$b == 2 you see that:

> d$b == 2
#[1]    NA  TRUE FALSE

And when you subset a row with NA, it adds a NA row:

> d[c(NA, 2), ]
#    a  b
#NA NA NA
#2   2  2

d[d$b == 2, ] cannot return the first row, since the first value of d$b == 2 should be 1, and here it is NA.

CodePudding user response:

Apart from Maël's right answer, to return only the TRUE values, use which. It returns the indices for which the condition is TRUE.

d <- data.frame(a = 1:3, b = c(NA, 2, 5))
d[d$b == 2,]
#>     a  b
#> NA NA NA
#> 2   2  2

d[which(d$b == 2), ]
#>   a b
#> 2 2 2

Created on 2022-10-28 with reprex v2.0.2

  •  Tags:  
  • r
  • Related