How to drop rows with NA (missing values) based on multiple conditions in group column?
Here is the dummy data:
a <- data.frame(c('b1', 'b2', 'b3', 'b4'),
c(NA, 1.5, 0.5, 1),
c(0.4, NA, 0.3, NA),
c(0.5, NA, NA,2),
c(-0.5, -2.5, -0.2,NA),
c(NA, NA, -0.4,NA),
c(-0.5, NA, -0.4,NA),
stringsAsFactors = FALSE)
colnames(a) <- c('id', 'group1_1', 'group1_2', 'group1_3', 'group2_1', 'group2_2', 'group2_3')
rownames(a) <- a$id
a_subset <- a[, 2:7]
a_subset
# group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
# b1 NA 0.4 0.5 -0.5 NA -0.5
# b2 1.5 NA NA -2.5 NA NA
# b3 0.5 0.3 NA -0.2 -0.4 -0.4
# b4 1.0 NA 2.0 NA NA NA
As you can see from above dataframe, group_1 and group_2 contains some missing values, and each group has triplicates.
Expected output:
# group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
# b1 NA 0.4 0.5 -0.5 NA -0.5
# b3 0.5 0.3 NA -0.2 -0.4 -0.4
# b4 1.0 NA 2.0 NA NA NA
From above you can see if 1 group contains at least 2 values it will not be removed.
Is it possible with dplyr approach?
CodePudding user response:
tidyverse
library(tidyverse)
a_subset %>%
filter(
rowSums(!is.na(across(starts_with("group1_")))) >= 2 |
rowSums(!is.na(across(starts_with("group2_")))) >= 2)
#> group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
#> b1 NA 0.4 0.5 -0.5 NA -0.5
#> b3 0.5 0.3 NA -0.2 -0.4 -0.4
#> b4 1.0 NA 2.0 NA NA NA
data
a_subset <- data.frame(
row.names = c("b1", "b2", "b3", "b4"),
group1_1 = c(NA, 1.5, 0.5, 1),
group1_2 = c(0.4, NA, 0.3, NA),
group1_3 = c(0.5, NA, NA, 2),
group2_1 = c(-0.5, -2.5, -0.2, NA),
group2_2 = c(NA, NA, -0.4, NA),
group2_3 = c(-0.5, NA, -0.4, NA)
)