the question is simple, but I would like to know if there is an elegant way to achieve this goal. I would like to filter all the rows with a NA value in any variable but an arbitrary one.
Something like this:
data %>% filter(complete.cases(-var1))
does anyone know the answer using dplyr
? I could list all of them but in a dataset with lots of variables this is impossible...
Thanks!
CodePudding user response:
You can do
data %>% filter(complete.cases(select(., -var1)))
Which does the job, as this reprex demonstrates.
First, create a dummy data set where the whole first column is NA
library(dplyr)
data <- setNames(iris[1:10,], paste0('var', 1:5))
data$var1 <- NA
data
#> var1 var2 var3 var4 var5
#> 1 NA 3.5 1.4 0.2 setosa
#> 2 NA 3.0 1.4 0.2 setosa
#> 3 NA 3.2 1.3 0.2 setosa
#> 4 NA 3.1 1.5 0.2 setosa
#> 5 NA 3.6 1.4 0.2 setosa
#> 6 NA 3.9 1.7 0.4 setosa
#> 7 NA 3.4 1.4 0.3 setosa
#> 8 NA 3.4 1.5 0.2 setosa
#> 9 NA 2.9 1.4 0.2 setosa
#> 10 NA 3.1 1.5 0.1 setosa
Note that filtering this by complete.cases
returns an empty data frame:
data %>% filter(complete.cases(.))
#> [1] var1 var2 var3 var4 var5
#> <0 rows> (or 0-length row.names)
But we can exclude var1
from complete.cases
like this:
data %>% filter(complete.cases(select(., -var1)))
#> var1 var2 var3 var4 var5
#> 1 NA 3.5 1.4 0.2 setosa
#> 2 NA 3.0 1.4 0.2 setosa
#> 3 NA 3.2 1.3 0.2 setosa
#> 4 NA 3.1 1.5 0.2 setosa
#> 5 NA 3.6 1.4 0.2 setosa
#> 6 NA 3.9 1.7 0.4 setosa
#> 7 NA 3.4 1.4 0.3 setosa
#> 8 NA 3.4 1.5 0.2 setosa
#> 9 NA 2.9 1.4 0.2 setosa
#> 10 NA 3.1 1.5 0.1 setosa
Created on 2023-01-17 with reprex v2.0.2