How to create a tidy table of t-tests?-CodePudding

I'm trying to figure out if there is a straightforward way to create a table of paired t-tests using tidyverse packages. There are already Q&As addressing this topic (e.g., here), but the existing answers all seem pretty convoluted.

Here's a reproducible example showing what I'm trying to accomplish -- a column of variable names, columns with the means for both items in the pair for each variable, and a column of p-values:

library(dplyr)
library(infer)
library(tidyr)

df <- mtcars %>% 
  mutate(engine = if_else(vs == 0, "V-shaped", "straight"))

v_shaped <- df %>% 
  filter(engine == "V-shaped") %>% 
  summarise(across(c(mpg, disp), mean)) %>% 
  pivot_longer(cols = everything()) %>% 
  rename(V_shaped = value)

straight <- df %>% 
  filter(engine == "straight") %>% 
  summarise(across(c(mpg, disp), mean)) %>% 
  pivot_longer(cols = everything()) %>% 
  rename(straight = value)

mpg <- df %>% 
  t_test(formula = mpg ~ engine, alternative = "two-sided") %>% 
  select(p_value) %>% 
  mutate(name = "mpg")

disp <- df %>% 
  t_test(formula = disp ~ engine, alternative = "two-sided") %>% 
  select(p_value) %>% 
  mutate(name = "disp")

p_values <- bind_rows(mpg, disp)

table <- v_shaped %>% 
  full_join(straight, by = "name") %>% 
  full_join(p_values, by = "name") 

table

#> # A tibble: 2 × 4
#>   name  V_shaped straight    p_value
#>   <chr>    <dbl>    <dbl>      <dbl>
#> 1 mpg       16.6     24.6 0.000110  
#> 2 disp     307.     132.  0.00000248

Obviously, this is not a good way to address this problem even for two variables, and it certainly does not scale well. But it does illustrate the intended outcome. Is there a way to do this in one pipeline? My actual use case involves many more variables, so -- ideally -- I'd be able to feed a vector of variable names into the pipe.

CodePudding user response：

Here's one way of doing it in a single pipe -

library(tidyverse)
library(infer)
  
df %>%
  #select the columns you are interested in
  select(mpg, disp, engine) %>%
  #get them in long format
  pivot_longer(cols = -engine) %>%
  #Divide the data in a list of dataframes
  split(.$name) %>%
  #For each dataframe
  map_df(~{
    #Get the mean value for each engine 
    .x %>% 
      group_by(engine) %>% 
      summarise(value = mean(value)) %>%
      #get the data in wide format
      pivot_wider(names_from = engine) %>%
      #Combine it with t.test result
      bind_cols(t_test(.x, formula = value ~ engine, alternative = "two-sided"))
    }, .id = "name")

#   name  `V-shaped` straight statistic  t_df    p_value alternative lower_ci upper_ci
#  <chr>      <dbl>    <dbl>     <dbl> <dbl>      <dbl> <chr>          <dbl>    <dbl>
#1 disp       307.     132.      -5.94  27.0 0.00000248 two.sided    -235.     -114. 
#2 mpg         16.6     24.6      4.67  22.7 0.000110   two.sided       4.42     11.5

Obviously, you can keep only the columns (using select) that is relevant to you.

CodePudding user response：

This can be achieved with a call to summarise() and a flattening of the results. If we use stats::t.test() the group means are returned as part of the test and don't need to be calculated separately. broom::tidy() ensures that each result is returned in a 1-row tibble.

library(dplyr)
library(purrr)
library(broom)

mtcars %>%
  summarise(across(-vs, \(x)
                   list(tidy(
                     t.test(x ~ vs, data = .,  alternative = "two.sided")
                   )))) %>%
  flatten_dfr(.id = "names") %>%
  rename("V-shaped" = estimate1, straight = estimate2)

# A tibble: 10 × 11
   names estimate `V-shaped` straight statistic      p.value parameter conf.low conf.high method                  alternative
   <chr>    <dbl>      <dbl>    <dbl>     <dbl>        <dbl>     <dbl>    <dbl>     <dbl> <chr>                   <chr>      
 1 mpg     -7.94      16.6      24.6     -4.67  0.000110          22.7  -11.5      -4.42  Welch Two Sample t-test two.sided  
 2 cyl      2.87       7.44      4.57     7.79  0.0000000112      29.9    2.12      3.63  Welch Two Sample t-test two.sided  
 3 disp   175.       307.      132.       5.94  0.00000248        27.0  114.      235.    Welch Two Sample t-test two.sided  
 4 hp      98.4      190.       91.4      6.29  0.00000182        23.6   66.1     131.    Welch Two Sample t-test two.sided  
 5 drat    -0.467      3.39      3.86    -2.66  0.0129            27.1   -0.827    -0.107 Welch Two Sample t-test two.sided  
 6 wt       1.08       3.69      2.61     3.76  0.000728          30.0    0.493     1.66  Welch Two Sample t-test two.sided  
 7 qsec    -2.64      16.7      19.3     -5.94  0.00000352        24.6   -3.56     -1.72  Welch Two Sample t-test two.sided  
 8 am      -0.167      0.333     0.5     -0.927 0.362             27.1   -0.535     0.202 Welch Two Sample t-test two.sided  
 9 gear    -0.302      3.56      3.86    -1.22  0.232             28.8   -0.807     0.204 Welch Two Sample t-test two.sided  
10 carb     1.83       3.61      1.79     3.98  0.000413          29.6    0.888     2.76  Welch Two Sample t-test two.sided