I'm trying to figure out if there is a straightforward way to create a table of paired t-tests using tidyverse packages. There are already Q&As addressing this topic (e.g., here), but the existing answers all seem pretty convoluted.
Here's a reproducible example showing what I'm trying to accomplish -- a column of variable names, columns with the means for both items in the pair for each variable, and a column of p-values:
library(dplyr)
library(infer)
library(tidyr)
df <- mtcars %>%
mutate(engine = if_else(vs == 0, "V-shaped", "straight"))
v_shaped <- df %>%
filter(engine == "V-shaped") %>%
summarise(across(c(mpg, disp), mean)) %>%
pivot_longer(cols = everything()) %>%
rename(V_shaped = value)
straight <- df %>%
filter(engine == "straight") %>%
summarise(across(c(mpg, disp), mean)) %>%
pivot_longer(cols = everything()) %>%
rename(straight = value)
mpg <- df %>%
t_test(formula = mpg ~ engine, alternative = "two-sided") %>%
select(p_value) %>%
mutate(name = "mpg")
disp <- df %>%
t_test(formula = disp ~ engine, alternative = "two-sided") %>%
select(p_value) %>%
mutate(name = "disp")
p_values <- bind_rows(mpg, disp)
table <- v_shaped %>%
full_join(straight, by = "name") %>%
full_join(p_values, by = "name")
table
#> # A tibble: 2 × 4
#> name V_shaped straight p_value
#> <chr> <dbl> <dbl> <dbl>
#> 1 mpg 16.6 24.6 0.000110
#> 2 disp 307. 132. 0.00000248
Obviously, this is not a good way to address this problem even for two variables, and it certainly does not scale well. But it does illustrate the intended outcome. Is there a way to do this in one pipeline? My actual use case involves many more variables, so -- ideally -- I'd be able to feed a vector of variable names into the pipe.
CodePudding user response:
Here's one way of doing it in a single pipe -
library(tidyverse)
library(infer)
df %>%
#select the columns you are interested in
select(mpg, disp, engine) %>%
#get them in long format
pivot_longer(cols = -engine) %>%
#Divide the data in a list of dataframes
split(.$name) %>%
#For each dataframe
map_df(~{
#Get the mean value for each engine
.x %>%
group_by(engine) %>%
summarise(value = mean(value)) %>%
#get the data in wide format
pivot_wider(names_from = engine) %>%
#Combine it with t.test result
bind_cols(t_test(.x, formula = value ~ engine, alternative = "two-sided"))
}, .id = "name")
# name `V-shaped` straight statistic t_df p_value alternative lower_ci upper_ci
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
#1 disp 307. 132. -5.94 27.0 0.00000248 two.sided -235. -114.
#2 mpg 16.6 24.6 4.67 22.7 0.000110 two.sided 4.42 11.5
Obviously, you can keep only the columns (using select
) that is relevant to you.
CodePudding user response:
This can be achieved with a call to summarise()
and a flattening of the results. If we use stats::t.test()
the group means are returned as part of the test and don't need to be calculated separately. broom::tidy()
ensures that each result is returned in a 1-row tibble.
library(dplyr)
library(purrr)
library(broom)
mtcars %>%
summarise(across(-vs, \(x)
list(tidy(
t.test(x ~ vs, data = ., alternative = "two.sided")
)))) %>%
flatten_dfr(.id = "names") %>%
rename("V-shaped" = estimate1, straight = estimate2)
# A tibble: 10 × 11
names estimate `V-shaped` straight statistic p.value parameter conf.low conf.high method alternative
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 mpg -7.94 16.6 24.6 -4.67 0.000110 22.7 -11.5 -4.42 Welch Two Sample t-test two.sided
2 cyl 2.87 7.44 4.57 7.79 0.0000000112 29.9 2.12 3.63 Welch Two Sample t-test two.sided
3 disp 175. 307. 132. 5.94 0.00000248 27.0 114. 235. Welch Two Sample t-test two.sided
4 hp 98.4 190. 91.4 6.29 0.00000182 23.6 66.1 131. Welch Two Sample t-test two.sided
5 drat -0.467 3.39 3.86 -2.66 0.0129 27.1 -0.827 -0.107 Welch Two Sample t-test two.sided
6 wt 1.08 3.69 2.61 3.76 0.000728 30.0 0.493 1.66 Welch Two Sample t-test two.sided
7 qsec -2.64 16.7 19.3 -5.94 0.00000352 24.6 -3.56 -1.72 Welch Two Sample t-test two.sided
8 am -0.167 0.333 0.5 -0.927 0.362 27.1 -0.535 0.202 Welch Two Sample t-test two.sided
9 gear -0.302 3.56 3.86 -1.22 0.232 28.8 -0.807 0.204 Welch Two Sample t-test two.sided
10 carb 1.83 3.61 1.79 3.98 0.000413 29.6 0.888 2.76 Welch Two Sample t-test two.sided