Apologies if this is a common question, but it has caused some unexpected frustration in a script I am running. I have a dataset which roughly looks like the following (though much larger in practice):
df <- data.frame(A = c(1, 2, 3, NA, NA, 6),
B = c(10, 20, 30, 40 , 50, 60))
My script cycles through a list of values from column A and is supposed to take action based on whether the values in B are larger than 25. However, the corresponding values of B for missing values in A are ALWAYS returned, whereas I want them to always be excluded. For example,
df$B[df$A == 6]
Gives the output
NA NA 60
Rather than the expected
60
Thus, the code
df$B[df$A == 6] > 25
returns
NA NA TRUE
rather than just
TRUE
Could someone explain the reason for this and any simple solutions? The immediate solution that came to mind is to remove any rows with NA values in column A, but I would prefer a solution which is robust to missingness in A and will only return the single desired logical value from B.
CodePudding user response:
Whenever you ask whether Not Available (NA
) value is equal to number or anything else - you got the only possible answer: The answer is Not Available (NA
).
NA
might be equal to 6
, or to John the Baptist
, or to ⛄ as well as to any other object. It is just impossible to say if it does, since the value is not available.
To get the answer you want, you can use na.omit()
or na.exclude()
on the results. Or you can apply yet another logical condition during subsetting:
with(df, B[A == 6 & !is.na(A)])
# [1] 60