Home > database >  How to filter rows with NA based on multiple conditions
How to filter rows with NA based on multiple conditions

Time:06-08

How to drop rows with NA (missing values) based on multiple conditions in group column?

Here is the dummy data:

a <- data.frame(c('b1', 'b2', 'b3', 'b4'),
                c(NA, 1.5, 0.5, 1),
                c(0.4, NA, 0.3, NA),
                c(0.5, NA, NA,2),
                c(-0.5, -2.5, -0.2,NA),
                c(NA, NA, -0.4,NA),
                c(-0.5, NA, -0.4,NA),
                stringsAsFactors = FALSE)

colnames(a) <- c('id', 'group1_1', 'group1_2', 'group1_3', 'group2_1', 'group2_2', 'group2_3')
rownames(a) <- a$id

a_subset <- a[, 2:7]
a_subset

#    group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
# b1       NA      0.4      0.5     -0.5       NA     -0.5
# b2      1.5       NA       NA     -2.5       NA       NA
# b3      0.5      0.3       NA     -0.2     -0.4     -0.4
# b4      1.0       NA      2.0       NA       NA       NA

As you can see from above dataframe, group_1 and group_2 contains some missing values, and each group has triplicates.

Expected output:

#    group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
# b1       NA      0.4      0.5     -0.5       NA     -0.5
# b3      0.5      0.3       NA     -0.2     -0.4     -0.4
# b4      1.0       NA      2.0       NA       NA       NA

From above you can see if 1 group contains at least 2 values it will not be removed.

Is it possible with dplyr approach?

CodePudding user response:

tidyverse


library(tidyverse)

a_subset %>% 
  filter(
    rowSums(!is.na(across(starts_with("group1_")))) >= 2 | 
      rowSums(!is.na(across(starts_with("group2_")))) >= 2)

#>    group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
#> b1       NA      0.4      0.5     -0.5       NA     -0.5
#> b3      0.5      0.3       NA     -0.2     -0.4     -0.4
#> b4      1.0       NA      2.0       NA       NA       NA

data

a_subset <- data.frame(
   row.names = c("b1", "b2", "b3", "b4"),
    group1_1 = c(NA, 1.5, 0.5, 1),
    group1_2 = c(0.4, NA, 0.3, NA),
    group1_3 = c(0.5, NA, NA, 2),
    group2_1 = c(-0.5, -2.5, -0.2, NA),
    group2_2 = c(NA, NA, -0.4, NA),
    group2_3 = c(-0.5, NA, -0.4, NA)
)
  • Related