Home > Enterprise >  how to compute row means iff the number of NA's is smaller than a given value
how to compute row means iff the number of NA's is smaller than a given value

Time:07-29

I have questionnaire data (rows=individuals, cols=scores on questions)and would like to compute a sumscore for individuals if they answered a given number of questions, otherwise the sumscore variable should be NA. The code below computes row sums, counts the number of NA's, assigns an otherwise not occurring value to the row sum variable in case the number of NA's is large, and then replaces that with an NA. The code works but I bet there is a more elegant way...Suggestions much appreciated.

dum<-tibble(x=c(1,NA,2,3,4),y=c(1,2,3,NA,5),z=c(1,NA,2,3,4))
dum<-dum %>% 
  mutate(sumsum = rowSums(select(., x:z), na.rm = TRUE))
dum<-dum %>% 
  mutate(countna=rowSums(is.na(select(.,x:z))))
dum<-dum %>% 
  mutate(sumsum=case_when(countna>=2 ~ 100,TRUE~sumsum))
dum<-dum %>%  
  mutate(sumsum = na_if(sumsum, 100))

CodePudding user response:

You may combine your code in one statement -

library(dplyr)

dum <- tibble(x=c(1,NA,2,3,4),y=c(1,2,3,NA,5),z=c(1,NA,2,3,4))

dum <- dum %>%
  mutate(sumsum = replace(rowSums(select(., x:z), na.rm = TRUE), 
                          rowSums(is.na(select(., x:z))) >= 2, NA))

dum
# A tibble: 5 × 4
#      x     y     z sumsum
#  <dbl> <dbl> <dbl>  <dbl>
#1     1     1     1      3
#2    NA     2    NA     NA
#3     2     3     2      7
#4     3    NA     3      6
#5     4     5     4     13

CodePudding user response:

You can also try this:

dum<-tibble(x=c(1,NA,2,3,4),y=c(1,2,3,NA,5),z=c(1,NA,2,3,4))
dum2 <- dum %>% mutate(sumsum = ifelse(rowSums(is.na(select(.,x:z)))>=2, NA,rowSums(select(., x:z), na.rm = TRUE)))
dum2
# A tibble: 5 × 4
      x     y     z sumsum
  <dbl> <dbl> <dbl>  <dbl>
1     1     1     1      3
2    NA     2    NA     NA
3     2     3     2      7
4     3    NA     3      6
5     4     5     4     13
  • Related