I'm working with a huge database and need to sum all columns that start with a specific string. I also need to ignore NA's when at least one of the columns is not NA (for example 1 5 NA = 6); but when all columns are NA, the sum must be NA.
I have done the NA ignore thing successfully with sum_ from package hablar, but when I tried to combine it with start_With, it took endless time and did not seem to work. Here is a reproducible example:
library(dplyr)
set.seed(8)
id <- seq(1,700, by = 1)
a1_x <- runif(700, 0, 10)
a1_y <- runif(700, 0, 10)
a2_x <- runif(700, 0, 10)
df <- data.frame(id, a1_x, a1_y, a2_x)
df <- round(df, digits = -0.1)
df[df == 0] <- NA
#Try with rowsums. This works but I need to ignore na just when both columns are NA
df <- df %>%
mutate(a1_sum = rowSums(select(., starts_with('a1'))))
#Tried with sum_ of package 'hablar' but it takes endless time
library(hablar)
df <- df %>%
mutate(a1_sum = sum_(select(., starts_with('a1'))))
Does anybody knows how to achieve this? Thanks in advance!
CodePudding user response:
We can use two sequential calls to mutate()
.
In the second call, we can use if_else()
to replace values with NA if_all()
values are NA in that row. We can also use replace
or purrr::modify_if
.
df %>% mutate(a1_sum = rowSums(across(starts_with('a1')), na.rm = TRUE),
a1_sum = if_else(if_all(c(starts_with('a1'), -a1_sum), is.na),
NA_real_,
a1_sum))
### OR ###
df %>% mutate(a1_sum = rowSums(across(starts_with('a1')), na.rm = TRUE),
a1_sum = replace(a1_sum, if_all(c(starts_with('a1'), -a1_sum), is.na),
NA))
In case it is impossible to have zeroes in the actual data (the example df was hard coded to have no zeroes, not sure if it would happen with actual data), we can also use the simpler mutate %>% na_if()
:
df %>% mutate(a1_sum = rowSums(across(starts_with('a1')), na.rm = TRUE) %>%
na_if(0))
CodePudding user response:
Maybe this is also of help. Take a helper function and call it with c_across and rowwise().
my_fun <- function(x){
if(all(is.na(x))) return(NA) else return(sum(x,na.rm = TRUE))
}
df %>%
rowwise() %>%
mutate(a1_sum = my_fun(c_across(starts_with("a1"))))