In dplyr how do you filter to remove NA values from columns in a character vector?-CodePudding

I'd like to remove rows with NA in any one of the columns in a vector of column names.

Here's a simplified example with just a couple of columns.

data <- structure(list(sample_id = c("2023.01.12_2", "2023.01.12_27", 
"2023.01.12_27", "2023.01.12_3", "2023.01.12_27", "2023.01.12_27", 
"2023.01.12_4", "2023.01.12_27", "2023.01.12_27", "2023.01.12_5"
), group = c("Unedited", "Rob", "Rob", "Partial_promoter", "Rob", 
"Rob", "Promoter_and_ATG", "Rob", "Rob", "ATG"), day = c(6, NA, 
NA, 6, NA, NA, 6, NA, NA, 6), x = c(114.243333333333, 115.036666666667, 
115.073333333333, 114.41, 116.11, 116.163333333333, 113.426666666667, 
116.15, 117.253333333333, 113.46)), row.names = c(NA, -10L), class = "data.frame")

cols <- c("group", "day")

I've tried a few ways, but can't get it working. This one below seems to work.

data %>%
filter(across(.cols = cols, .fns = ~ !is.na(.x)))

But when I try reversing it, to select the columns that are NA (for QC purposes I want to keep them, but just separately) I get nothing:

data %>%
  filter(across(.cols = cols, .fns = ~ is.na(.x)))

Any ideas?

CodePudding user response：

You may want to use if_any for filtering when is.na condition is met by either group or day

 data %>%
   filter(if_any(.cols = cols, .fns = ~is.na(.x)))
      sample_id group day        x
1 2023.01.12_27   Rob  NA 115.0367
2 2023.01.12_27   Rob  NA 115.0733
3 2023.01.12_27   Rob  NA 116.1100
4 2023.01.12_27   Rob  NA 116.1633
5 2023.01.12_27   Rob  NA 116.1500
6 2023.01.12_27   Rob  NA 117.2533

Also there is a if_all helper to check if all cols meet the contidion of being na:

 data %>%
       filter(if_any(.cols = cols, .fns = ~is.na(.x)))

This retuns none result because only day meets the contion.

Since the warning about using across inside filter, you can replace your first filter by:

data %>%
  filter(if_all(.cols = cols, .fns = ~ !is.na(.x)))

CodePudding user response：

You could use drop_na and any_of based on the columns you mentioned. Here is some reproducible code:

cols <- c("group", "day")
library(tidyr)
data |>
  drop_na(any_of(cols))
#>      sample_id            group day        x
#> 1 2023.01.12_2         Unedited   6 114.2433
#> 2 2023.01.12_3 Partial_promoter   6 114.4100
#> 3 2023.01.12_4 Promoter_and_ATG   6 113.4267
#> 4 2023.01.12_5              ATG   6 113.4600

^{Created on 2023-01-16 with reprex v2.0.2}