Home > Mobile >  Filter rows that have complete missings on specific set of columns r
Filter rows that have complete missings on specific set of columns r

Time:01-25

Say I have the following data:

df <- data.frame(cntry = c("A", "A", "B", "C", "C"),
                 year = c(2010, 2011, 2010, 2010, 2011),
                 env_1 = c(NA,NA,0,0,NA),
                 env_2 = c(NA,NA,1,1,0),
                 env_3 = c(NA,1,1,NA,0))

Which looks like:

cntry year env_1 env_2 env_3
A     2010    NA    NA    NA
A     2011    NA    NA     1
B     2010     0     1     1
C     2010     0     1    NA
C     2011    NA     0     0

I want to remove all rows that have complete missings on all substantive variables, in this case env_1, env_2, and env_3. How can I filter for those, while keeping the variables that have data in some, but missing in others?

I thought something like:

df %>% 
  filter(if_all(starts_with("env"), ~ !is.na(.x)))

But actually it should somehow be the opposite (this removes all rows with at least one NA)

I want to keep the following:

cntry year env_1 env_2 env_3
A     2011    NA    NA     1
B     2010     0     1     1
C     2010     0     1    NA
C     2011    NA     0     0

CodePudding user response:

Here is one (very probably not optimal) solution:

df <- data.frame(cntry = c("A", "A", "B", "C", "C"),
                 year = c(2010, 2011, 2010, 2010, 2011),
                 env_1 = c(NA,NA,0,0,NA),
                 env_2 = c(NA,NA,1,1,0),
                 env_3 = c(NA,1,1,NA,0))

df[rowSums(is.na(df[c('env_1','env_2','env_3')])) != 3,]

output:

  cntry year env_1 env_2 env_3
2     A 2011    NA    NA     1
3     B 2010     0     1     1
4     C 2010     0     1    NA
5     C 2011    NA     0     0

CodePudding user response:

You can also use complete.cases with if_any, avoiding negation:

library(dplyr)
df %>% 
  filter(if_any(starts_with("env"), complete.cases))
  • Related