How to detect column that is all zeros and replace with NA while ignoring NA column-CodePudding

I would like to be able to find columns that ONLY have ALL zero values and replace these with NAs. Found a partial solution here: Replace all zero columns with NA but my data frame also contains columns with all NAs and this fails. Here is an example:

DF <- tibble(
         y1 = c(0.0, 0.0, 0.0, 0.0, 0.0),
         y2 = c(0.0, 0.75, 1.5, 3.5, 6.0),
         y3 = c(0.0, 1.0, 2.0, 4.0, 3.0),
         y4 = c(NA, NA, NA, NA, NA))

DF %>% mutate_all(~case_when(all(. == 0,  na.rm = TRUE) ~ NA_real_, TRUE ~ .))

So the goal is to change y1 to all NAs and leave everything else as is. This code works if DF is just y1:y3. Including the NA column throws an error. Note that I altered the original solution to include na.rm in the all() function, but this does not solve the issue. Error returned is

Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]

CodePudding user response：

A possible solution (notice that mutate_all is superseded):

library(dplyr)

DF %>% 
  mutate(across(y1:y4, ~ {if (all(.x == 0, na.rm = T)) NA_real_ else .x}))

#> # A tibble: 5 × 4
#>      y1    y2    y3    y4
#>   <dbl> <dbl> <dbl> <dbl>
#> 1    NA  0        0    NA
#> 2    NA  0.75     1    NA
#> 3    NA  1.5      2    NA
#> 4    NA  3.5      4    NA
#> 5    NA  6        3    NA

CodePudding user response：

Using base R

 DF[!colSums(DF != 0, na.rm = TRUE)] <- NA

-output

> DF
# A tibble: 5 × 4
  y1       y2    y3 y4   
  <lgl> <dbl> <dbl> <lgl>
1 NA     0        0 NA   
2 NA     0.75     1 NA   
3 NA     1.5      2 NA   
4 NA     3.5      4 NA   
5 NA     6        3 NA

With dplyr, we can use a condition in where

DF %>%
    mutate(across(where(~ all(.x== 0, na.rm = TRUE)), ~ NA))

-output

# A tibble: 5 × 4
  y1       y2    y3 y4   
  <lgl> <dbl> <dbl> <lgl>
1 NA     0        0 NA   
2 NA     0.75     1 NA   
3 NA     1.5      2 NA   
4 NA     3.5      4 NA   
5 NA     6        3 NA