Home > Blockchain >  Getting a mean value based on multiple columes value
Getting a mean value based on multiple columes value

Time:11-01

I have several variables with a large amount of missing data, I am trying to calculate a mean and standard deviation of Z scores across multiple columns, but only if they have >3 of the variables of interest (it does not matter which of the Z scores as long as there were three). The aim is to be able to say which subjects have a mean Z score one or more SDs away from a reference groups mean Z score. My data is structured like below (but a lot larger)

Subject Zscore1 Zscore2 Zscore3 Zscore4 Zscore5 Group
as4 -2.4 -1.2 NA -2.3 1.6 ER
as1 -2.6 -1.4 -0.7 -1.4 -1.4 TL
as8 -2.7 NA -0.1 NA -1.1 TL
as10 NA -1.7 -0.9 2.3 1.7 ER

So i would like to code such as:

IF there are 3 or more Z scores present What is the mean and SD of the Z-scores

Any help would be appreciated.

CodePudding user response:

Do you need something like this:

library(dplyr)

df %>%  
  mutate(not_na_count = rowSums(!is.na(select(., -c(Subject, Group))))) %>% 
  filter(not_na_count >3) %>% 
  rowwise() %>% 
  mutate(avg = mean(c_across(2:6), na.rm = TRUE),
         sd = sd(c_across(2:6), na.rm = TRUE)
  )
 Subject Zscore1 Zscore2 Zscore3 Zscore4 Zscore5 Group not_na_count   avg    sd
  <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>        <dbl> <dbl> <dbl>
1 as4        -2.4    -1.2    NA      -2.3     1.6 ER               4 -1.08 1.86 
2 as1        -2.6    -1.4    -0.7    -1.4    -1.4 TL               5 -1.5  0.686
3 as10       NA      -1.7    -0.9     2.3     1.7 ER               4  0.35 1.95 
  • Related