R dplyr summarise mean and stdev using group

I have a dataframe that looks like this:

df <- data.frame("Experiment" = c(rep("Exp1", 6), rep("Exp2", 5), rep("Exp3", 4)),
                 "Replicate" = c("A","A","A","B","C","C","A","A","B","B","C","A","B","B","C"),
                 "Type" = c("alpha","beta","gamma","alpha","alpha","beta","alpha","gamma","beta","gamma","beta","alpha","alpha","gamma","beta"),
                 "Frequency" = c(10,100,1000,15,5,105,10,1010,95,1020,105,15,10,990,100))

I'm trying to calculate mean and stdev of Frequency for combination of Experiment and Type, and I first tried it by running this line:

df %>% group_by(Experiment, Type) %>% summarise(mean = mean(Frequency), sd = sd(Frequency)

If I run this, I get a tibble that looks like below:

Experiment   Type   mean   sd
Exp1         alpha  10     5
Exp1         beta   102.   3.54
Epx1         gamma  1000   NA

But I'd like R to think that all Type (alpha, beta, gamma) should exist for every combination of Experiment and Replicate, so that if there is no Frequency value for Type, R will use 0 instead of not including that value.

In other words, what I want needs to be calculated like below:

Experiment   Type   mean              sd
Exp1         alpha  mean(10,15,5)     sd(10,15,5)
Exp1         beta   mean(100,0,105)   sd(100,0,105)
Exp1         gamma  mean(1000,0,0)    sd(1000,0,0)

For example, for Exp1 beta, the summarise function I used above calculates mean(100,105) and sd(100,105) because Exp1 Replicate B doesn't exist in my df. But I want R to calculate mean(100,0,105) and sd(100,0,105) instead. Would anyone be able to give me some ideas on how to do this?

CodePudding user response：

You need to first complete your dataframe to fill in missing data with 0, then pipe the "completed" dataframe to your functions.

library(tidyverse)

df %>% 
  complete(Experiment, Type, Replicate, fill = list(Frequency = 0)) %>% 
  group_by(Experiment, Type) %>% 
  summarise(mean = mean(Frequency), sd = sd(Frequency), .groups = "drop")

# A tibble: 9 × 4
  Experiment Type    mean     sd
  <chr>      <chr>  <dbl>  <dbl>
1 Exp1       alpha  10      5   
2 Exp1       beta   68.3   59.2 
3 Exp1       gamma 333.   577.  
4 Exp2       alpha   3.33   5.77
5 Exp2       beta   66.7   58.0 
6 Exp2       gamma 677.   586.  
7 Exp3       alpha   8.33   7.64
8 Exp3       beta   33.3   57.7 
9 Exp3       gamma 330    572.

CodePudding user response：

You need to include Replicate in the group_by function and conver the output into a wider tibble. The number columns can be mutated by replacing NA values. Then, concatenating the mean and sd columns would give the desired output.

df %>% group_by(Experiment, Type, Replicate) %>% 
  summarise(mean = mean(Frequency), sd = sd(Frequency)) %>% 
  pivot_wider(names_from = Replicate, values_from =  c(mean, sd)) %>% 
mutate(across(where(is.double),~ replace_na(.,0))) %>% 
  mutate(mean = paste0("mean(", mean_A, ",", mean_B, ",", mean_C, ")"),
         sd = paste0("sd(", sd_A, ",", sd_B, ",", sd_C, ")")) %>% 
  select(Experiment, Type, mean, sd)

The output is

# A tibble: 9 x 4
# Groups:   Experiment, Type [9]
  Experiment Type  mean              sd       
  <chr>      <chr> <chr>             <chr>    
1 Exp1       alpha mean(10,15,5)     sd(0,0,0)
2 Exp1       beta  mean(100,0,105)   sd(0,0,0)
3 Exp1       gamma mean(1000,0,0)    sd(0,0,0)
4 Exp2       alpha mean(10,0,0)      sd(0,0,0)
5 Exp2       beta  mean(0,95,105)    sd(0,0,0)
6 Exp2       gamma mean(1010,1020,0) sd(0,0,0)
7 Exp3       alpha mean(15,10,0)     sd(0,0,0)
8 Exp3       beta  mean(0,0,100)     sd(0,0,0)
9 Exp3       gamma mean(0,990,0)     sd(0,0,0)