Home > Software engineering >  Account for different mean in one sample T-tests by group in r
Account for different mean in one sample T-tests by group in r

Time:08-31

Lat say I have the following data:

df <- data.frame(
    group1 = rep(c("Area_1", "Area_2", "Area_3"), 5),
    group2 = rep(c("A", "B", "C"), 5),
    temp = rnorm(length(group), 26, 6)
) 

I am trying to do one sample T-tests by group2 using the means by group1 as the mu in t.test. This can be done individually (for each group) like this:

library(dplyr)
library(tidyr)
library(broom)

g1mean <- df %>% 
      group_by(group1) %>%
      summarise(mu0 = mean(temp))

g1mean
# A tibble: 3 x 2
#      group1   mu0
#      <chr>  <dbl>
#    1 Area_1  28.3
#    2 Area_2  24.3
#    3 Area_3  26.5
    
tArea1 <- df %>% 
      group_by(group2) %>%
      summarise(res = list(tidy(t.test(temp, mu=28.3)))) %>%
      unnest()

But I have data with more than 300 groups for group1 and 500 groups for group2 and I'm looking for a suggestion to automate this.

CodePudding user response:

Here is a way avoiding the creation of the means data set.

df <- data.frame(
  group1 = rep(c("Area_1", "Area_2", "Area_3"), 5),
  group2 = rep(c("A", "B", "C"), 5),
  temp = rnorm(3*5, 26, 6)
) 

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
  library(broom)
})

df %>%
  group_by(group2) %>%
  mutate(mu0 = mean(temp)) %>%
  group_by(group1) %>%
  summarise(res = list(tidy(t.test(temp, mu = first(mu0))))) %>%
  unnest(cols = res)
#> # A tibble: 3 × 9
#>   group1 estimate statistic p.value parameter conf.low conf.high method         
#>   <chr>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>          
#> 1 Area_1     28.7         0       1         4     23.1      34.4 One Sample t-t…
#> 2 Area_2     25.2         0       1         4     18.7      31.6 One Sample t-t…
#> 3 Area_3     25.2         0       1         4     19.3      31.1 One Sample t-t…
#> # … with 1 more variable: alternative <chr>

Created on 2022-08-30 by the reprex package (v2.0.1)

CodePudding user response:

Here's an approach using mapping over distinct pairs (group1, group2).

library(tidyverse)

df %>% 
  distinct(group1, group2) %>%
  mutate(
    res = map2(.x = group2, 
               .y = group1, 
               .f = ~ t.test(df$temp[df$group2 == .x], 
                             mu = mean(df$temp[df$group1 == .y])))
  )
# A tibble: 3 × 3
# Groups:   group2 [3]
  group2 group1 res    
  <chr>  <chr>  <list> 
1 A      Area_1 <htest>
2 B      Area_2 <htest>
3 C      Area_3 <htest>
  • Related