There are several ways to identify and manipulate individual cells with missing data in R, e.g., with complete.cases
or even rowSums
.
However, I've not been able to find---or figure out myself---an expedient way to select rows that have missing data within a subsetted range of columns.
For example, in dataframe df
:
df <- data.frame(D1 = c('A', 'B', 'C', 'D'),
D2 = c(NA, 0, 1, 1),
V1 = c(11, NA, 33, NA),
V2 = c(111, 222, NA, NA)
)
df
# D1 D2 V1 V2
# A NA 11 111
# B 0 NA 222
# C 1 33 NA
# D 1 NA NA
I would like to select all rows that have missing data in both columns V1
and V2
, thus selecting row D
but not rows B
or C
(or A
).
I have a larger range of columns than given in that toy example, so selecting a set of columns with, e.g., &&
could make for a long command.
N.B., a similar SO question addresses selecting rows where none are NS
s.
CodePudding user response:
You can try this:
df %>% filter(is.na(V1) & is.na(V2))
OUTPUT
D1 D2 V1 V2
1 D 1 NA NA
CodePudding user response:
You can use dplyr::if_all
. You can select the columns very flexibly with tidyselect
, for instance using :
, c
, starts_with
...
library(dplyr)
df %>%
filter(if_all(V1:V2, is.na))
# D1 D2 V1 V2
#1 D 1 NA NA
Also works (this shows the flexibility of tidyselect
):
filter(df, if_all(3:4, is.na))
filter(df, if_all(starts_with("V"), is.na))
filter(df, if_all(c(V1, V2), is.na))
filter(df, if_all((last_col()-1):last_col(), is.na))
filter(df, if_all(num_range("V", 1:2), is.na))