Home > Net >  How to filter for rows containing NA?
How to filter for rows containing NA?

Time:11-22

If in x or y is NA, I want to keep this row containing NA and discard the rows, where both, x and y are not NA. I tried with dplyr::filter(), purrr::keep() and more but nothing worked. It is essential to do that conditionally and not by the row number since my data set is too large for that.

library(tibble, quietly = T, warn.conflicts = F)
library(dplyr, quietly = T, warn.conflicts = F)

df <- tribble(
  ~name, ~x, ~y, 
  "id_1", 1, NA,
  "id_2", 3, NA,
  "id_3", NA, 29,
  "id_4", -99, 0,
  "id_5", -98, 28,
) %>%
  mutate(name = factor(name))

df
#> # A tibble: 5 x 3
#>   name      x     y
#>   <fct> <dbl> <dbl>
#> 1 id_1      1    NA
#> 2 id_2      3    NA
#> 3 id_3     NA    29
#> 4 id_4    -99    0
#> 5 id_5    -98    28

Created on 2022-11-21 with reprex v2.0.2

The target is to keep rows like 1 to 3.

CodePudding user response:

You can use filter() with if_any to filter for rows with NA values. For example

df %>% filter(if_any(everything(), is.na))

If you just wanted to use a range of columns rather than all, you could use

df %>% filter(if_any(c(x, y), is.na))
df %>% filter(if_any(x:y, is.na))
df %>% filter(if_any(-name, is.na))

for example

CodePudding user response:

Using rowSums, check if at least one NA in a row:

df[ rowSums(is.na(df)) == 1, ]

CodePudding user response:

Base R solutions

df[!complete.cases(df),] 

df[is.na(df$x) | is.na(df$y),] # if you want to specify specific columns

Alternative packages solution

library(hacksaw)
df %>% keep_na(x, y, .logic = 'OR')

Output

> # A tibble: 3 × 3
>   name      x     y
>   <fct> <dbl> <dbl>
> 1 id_1      1    NA
> 2 id_2      3    NA
> 3 id_3     NA    29
  • Related