Say I have the following data:
df <- data.frame(cntry = c("A", "A", "B", "C", "C"),
year = c(2010, 2011, 2010, 2010, 2011),
env_1 = c(NA,NA,0,0,NA),
env_2 = c(NA,NA,1,1,0),
env_3 = c(NA,1,1,NA,0))
Which looks like:
cntry year env_1 env_2 env_3
A 2010 NA NA NA
A 2011 NA NA 1
B 2010 0 1 1
C 2010 0 1 NA
C 2011 NA 0 0
I want to remove all rows that have complete missings on all substantive variables, in this case env_1
, env_2
, and env_3
. How can I filter for those, while keeping the variables that have data in some, but missing in others?
I thought something like:
df %>%
filter(if_all(starts_with("env"), ~ !is.na(.x)))
But actually it should somehow be the opposite (this removes all rows with at least one NA)
I want to keep the following:
cntry year env_1 env_2 env_3
A 2011 NA NA 1
B 2010 0 1 1
C 2010 0 1 NA
C 2011 NA 0 0
CodePudding user response:
Here is one (very probably not optimal) solution:
df <- data.frame(cntry = c("A", "A", "B", "C", "C"),
year = c(2010, 2011, 2010, 2010, 2011),
env_1 = c(NA,NA,0,0,NA),
env_2 = c(NA,NA,1,1,0),
env_3 = c(NA,1,1,NA,0))
df[rowSums(is.na(df[c('env_1','env_2','env_3')])) != 3,]
output:
cntry year env_1 env_2 env_3
2 A 2011 NA NA 1
3 B 2010 0 1 1
4 C 2010 0 1 NA
5 C 2011 NA 0 0
CodePudding user response:
You can also use complete.cases
with if_any
, avoiding negation:
library(dplyr)
df %>%
filter(if_any(starts_with("env"), complete.cases))