Remove data.frame columns if all values in a certain range, pre-determined by a group, are NA-CodePudding

I would like to remove all data.frame columns, where ALL of its values are NA. Additionally, I want to exclude those columns, where only for a certain number of rows (defined by a grouping variable) are NA.

Example:

library(tidyverse)

df <- tribble(~group, ~id, ~x, ~y, ~z,
              "1", 1, 1,  1,  NA,
              "1", 2, NA, 1,  NA,
              "2", 1, 3,  1,  NA,
              "2", 2, NA, 1,  NA,
              "3", 1, 5,  1,  NA,
              "3", 2, NA, 1,  NA,
              "4", 1, 7,  NA, 1,
              "4", 2, NA, NA, 1,
              "5", 1, 9,  NA, NA,
              "5", 2, NA, NA, NA)

In this example, I would like to keep xand z, but remove y. That is because for the groups 4 and 5, its values are all NA. The other two columns also have a lot of missing values, but crucially have at least some values in groups 4 to 5. Ideally, I would like to do that with {dplyr}.

Code / Problem:

I'm kind of stuck with this code, which, for obvious reasons, doesn't work.

df %>% 
  mutate(group_new = ifelse(group=="4"|group=="5", 1, 0)) %>%
  group_by(group_new) %>%
  select_if(function(col) all(!is.na(col)))

There are several problems with this, which is that I would like to avoid creating a new grouping variable, and of course the fact that select_if doesn't take the group_by() condition into account but rather looks for complete columns (of which there are none).

Any help with this is much appreciated!

CodePudding user response：

library(dplyr)
df %>% 
  select(-where(~ all(is.na(.x[df$group %in% 4:5]))))

output

# A tibble: 10 × 4
   group    id     x     z
   <chr> <dbl> <dbl> <dbl>
 1 1         1     1    NA
 2 1         2    NA    NA
 3 2         1     3    NA
 4 2         2    NA    NA
 5 3         1     5    NA
 6 3         2    NA    NA
 7 4         1     7     1
 8 4         2    NA     1
 9 5         1     9    NA
10 5         2    NA    NA