Getting a mean value based on multiple columes value-CodePudding

I have several variables with a large amount of missing data, I am trying to calculate a mean and standard deviation of Z scores across multiple columns, but only if they have >3 of the variables of interest (it does not matter which of the Z scores as long as there were three). The aim is to be able to say which subjects have a mean Z score one or more SDs away from a reference groups mean Z score. My data is structured like below (but a lot larger)

Subject	Zscore1	Zscore2	Zscore3	Zscore4	Zscore5	Group
as4	-2.4	-1.2	NA	-2.3	1.6	ER
as1	-2.6	-1.4	-0.7	-1.4	-1.4	TL
as8	-2.7	NA	-0.1	NA	-1.1	TL
as10	NA	-1.7	-0.9	2.3	1.7	ER

So i would like to code such as:

IF there are 3 or more Z scores present What is the mean and SD of the Z-scores

Any help would be appreciated.

CodePudding user response：

Do you need something like this:

library(dplyr)

df %>%  
  mutate(not_na_count = rowSums(!is.na(select(., -c(Subject, Group))))) %>% 
  filter(not_na_count >3) %>% 
  rowwise() %>% 
  mutate(avg = mean(c_across(2:6), na.rm = TRUE),
         sd = sd(c_across(2:6), na.rm = TRUE)
  )

 Subject Zscore1 Zscore2 Zscore3 Zscore4 Zscore5 Group not_na_count   avg    sd
  <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>        <dbl> <dbl> <dbl>
1 as4        -2.4    -1.2    NA      -2.3     1.6 ER               4 -1.08 1.86 
2 as1        -2.6    -1.4    -0.7    -1.4    -1.4 TL               5 -1.5  0.686
3 as10       NA      -1.7    -0.9     2.3     1.7 ER               4  0.35 1.95