Home > database >  how to filter complete cases in all variables but one?
how to filter complete cases in all variables but one?

Time:01-18

the question is simple, but I would like to know if there is an elegant way to achieve this goal. I would like to filter all the rows with a NA value in any variable but an arbitrary one.

Something like this:

data %>% filter(complete.cases(-var1))

does anyone know the answer using dplyr? I could list all of them but in a dataset with lots of variables this is impossible...

Thanks!

CodePudding user response:

You can do

data %>% filter(complete.cases(select(., -var1)))

Which does the job, as this reprex demonstrates.

First, create a dummy data set where the whole first column is NA

library(dplyr)

data <- setNames(iris[1:10,], paste0('var', 1:5))
data$var1 <- NA

data
#>    var1 var2 var3 var4   var5
#> 1    NA  3.5  1.4  0.2 setosa
#> 2    NA  3.0  1.4  0.2 setosa
#> 3    NA  3.2  1.3  0.2 setosa
#> 4    NA  3.1  1.5  0.2 setosa
#> 5    NA  3.6  1.4  0.2 setosa
#> 6    NA  3.9  1.7  0.4 setosa
#> 7    NA  3.4  1.4  0.3 setosa
#> 8    NA  3.4  1.5  0.2 setosa
#> 9    NA  2.9  1.4  0.2 setosa
#> 10   NA  3.1  1.5  0.1 setosa

Note that filtering this by complete.cases returns an empty data frame:

data %>% filter(complete.cases(.))
#> [1] var1 var2 var3 var4 var5
#> <0 rows> (or 0-length row.names)

But we can exclude var1 from complete.cases like this:

data %>% filter(complete.cases(select(., -var1)))
#>    var1 var2 var3 var4   var5
#> 1    NA  3.5  1.4  0.2 setosa
#> 2    NA  3.0  1.4  0.2 setosa
#> 3    NA  3.2  1.3  0.2 setosa
#> 4    NA  3.1  1.5  0.2 setosa
#> 5    NA  3.6  1.4  0.2 setosa
#> 6    NA  3.9  1.7  0.4 setosa
#> 7    NA  3.4  1.4  0.3 setosa
#> 8    NA  3.4  1.5  0.2 setosa
#> 9    NA  2.9  1.4  0.2 setosa
#> 10   NA  3.1  1.5  0.1 setosa

Created on 2023-01-17 with reprex v2.0.2

  • Related