Home > Blockchain >  Logical Indexing with NA in R - How to set to FALSE or exclude rather than return NA?
Logical Indexing with NA in R - How to set to FALSE or exclude rather than return NA?

Time:12-08

Apologies if this is a common question, but it has caused some unexpected frustration in a script I am running. I have a dataset which roughly looks like the following (though much larger in practice):

df <- data.frame(A = c(1, 2, 3, NA, NA, 6), 
                 B = c(10, 20, 30, 40 , 50, 60))

My script cycles through a list of values from column A and is supposed to take action based on whether the values in B are larger than 25. However, the corresponding values of B for missing values in A are ALWAYS returned, whereas I want them to always be excluded. For example,

df$B[df$A == 6]

Gives the output

NA NA 60

Rather than the expected

60

Thus, the code

df$B[df$A == 6] > 25

returns

NA NA TRUE

rather than just

TRUE

Could someone explain the reason for this and any simple solutions? The immediate solution that came to mind is to remove any rows with NA values in column A, but I would prefer a solution which is robust to missingness in A and will only return the single desired logical value from B.

CodePudding user response:

Whenever you ask whether Not Available (NA) value is equal to number or anything else - you got the only possible answer: The answer is Not Available (NA).

NA might be equal to 6, or to John the Baptist, or to ⛄ as well as to any other object. It is just impossible to say if it does, since the value is not available.

To get the answer you want, you can use na.omit() or na.exclude() on the results. Or you can apply yet another logical condition during subsetting:

with(df, B[A == 6 & !is.na(A)])
# [1] 60
  • Related