I have the following data with 10 entries:
test_data_1 <- structure(list(Art = c(188, NA, NA, 140, NA, 182, NA, NA, 182,
NA)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
Let's say I want to keep only the NAs, 188 and 140. So I tried the following command:
test_data_1 %>% filter(is.na(Art), Art != 182) # with | instead of a comma, it works
With this command, a tibble with zero entries results. Why do I have to use the | sign instead of a comma? This site (https://sebastiansauer.github.io/dplyr_filter/) states: "Multiple logical comparisons can be combined. Just add ‘em up using commas; that amounts to logical OR “addition”:" So the comma should act as an OR, but it doesn't. Another approach:
test_data_1 %>% filter(Art != 182)
Here, by dplyr default, the 6 NAs entries are deleted, which is not my wish. The command na.rm=FALSE doesn't help, either. Now zero entries are kept. Why is that? Why aren't at least the entries 188 and 140 kept?
test_data_1 %>% filter(Art != 182, na.rm=FALSE)
Last question: If I want to keep various numbers in a column, I could use %in% followed by a vector, e.g.:
test_data_1 %>% filter(Art %in% c(140,188))
But how could I combine %in% with is.na if I would just like to keep the NAs and e.g. 140?
CodePudding user response:
Use |
instead of &
. With filter
, multiple expressions separated by ,
are taken as &
. It is not possible to have a value that is both NA
and not equal to 182
library(dplyr)
test_data_1 %>%
filter(is.na(Art) | Art != 182)
-output
# A tibble: 8 × 1
Art
<dbl>
1 188
2 NA
3 NA
4 140
5 NA
6 NA
7 NA
8 NA
The second part of the question is with %in%
. We can use |
again
test_data_1 %>%
filter(Art %in% c(140,188) | is.na(Art))
# A tibble: 8 × 1
Art
<dbl>
1 188
2 NA
3 NA
4 140
5 NA
6 NA
7 NA
8 NA
NOTE: By default, filter
removes the NA
elements. In addition, there is no na.rm
argument in filter