Lat say I have the following data:
df <- data.frame(
group1 = rep(c("Area_1", "Area_2", "Area_3"), 5),
group2 = rep(c("A", "B", "C"), 5),
temp = rnorm(length(group), 26, 6)
)
I am trying to do one sample T-tests by group2
using the means by group1
as the mu
in t.test
.
This can be done individually (for each group) like this:
library(dplyr)
library(tidyr)
library(broom)
g1mean <- df %>%
group_by(group1) %>%
summarise(mu0 = mean(temp))
g1mean
# A tibble: 3 x 2
# group1 mu0
# <chr> <dbl>
# 1 Area_1 28.3
# 2 Area_2 24.3
# 3 Area_3 26.5
tArea1 <- df %>%
group_by(group2) %>%
summarise(res = list(tidy(t.test(temp, mu=28.3)))) %>%
unnest()
But I have data with more than 300 groups for group1 and 500 groups for group2 and I'm looking for a suggestion to automate this.
CodePudding user response:
Here is a way avoiding the creation of the means data set.
df <- data.frame(
group1 = rep(c("Area_1", "Area_2", "Area_3"), 5),
group2 = rep(c("A", "B", "C"), 5),
temp = rnorm(3*5, 26, 6)
)
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
library(broom)
})
df %>%
group_by(group2) %>%
mutate(mu0 = mean(temp)) %>%
group_by(group1) %>%
summarise(res = list(tidy(t.test(temp, mu = first(mu0))))) %>%
unnest(cols = res)
#> # A tibble: 3 × 9
#> group1 estimate statistic p.value parameter conf.low conf.high method
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Area_1 28.7 0 1 4 23.1 34.4 One Sample t-t…
#> 2 Area_2 25.2 0 1 4 18.7 31.6 One Sample t-t…
#> 3 Area_3 25.2 0 1 4 19.3 31.1 One Sample t-t…
#> # … with 1 more variable: alternative <chr>
Created on 2022-08-30 by the reprex package (v2.0.1)
CodePudding user response:
Here's an approach using mapping over distinct pairs (group1
, group2
).
library(tidyverse)
df %>%
distinct(group1, group2) %>%
mutate(
res = map2(.x = group2,
.y = group1,
.f = ~ t.test(df$temp[df$group2 == .x],
mu = mean(df$temp[df$group1 == .y])))
)
# A tibble: 3 × 3
# Groups: group2 [3]
group2 group1 res
<chr> <chr> <list>
1 A Area_1 <htest>
2 B Area_2 <htest>
3 C Area_3 <htest>