Home > Blockchain >  How to create a tidy table of t-tests?
How to create a tidy table of t-tests?

Time:01-13

I'm trying to figure out if there is a straightforward way to create a table of paired t-tests using tidyverse packages. There are already Q&As addressing this topic (e.g., here), but the existing answers all seem pretty convoluted.

Here's a reproducible example showing what I'm trying to accomplish -- a column of variable names, columns with the means for both items in the pair for each variable, and a column of p-values:

library(dplyr)
library(infer)
library(tidyr)

df <- mtcars %>% 
  mutate(engine = if_else(vs == 0, "V-shaped", "straight"))

v_shaped <- df %>% 
  filter(engine == "V-shaped") %>% 
  summarise(across(c(mpg, disp), mean)) %>% 
  pivot_longer(cols = everything()) %>% 
  rename(V_shaped = value)

straight <- df %>% 
  filter(engine == "straight") %>% 
  summarise(across(c(mpg, disp), mean)) %>% 
  pivot_longer(cols = everything()) %>% 
  rename(straight = value)

mpg <- df %>% 
  t_test(formula = mpg ~ engine, alternative = "two-sided") %>% 
  select(p_value) %>% 
  mutate(name = "mpg")

disp <- df %>% 
  t_test(formula = disp ~ engine, alternative = "two-sided") %>% 
  select(p_value) %>% 
  mutate(name = "disp")

p_values <- bind_rows(mpg, disp)

table <- v_shaped %>% 
  full_join(straight, by = "name") %>% 
  full_join(p_values, by = "name") 

table

#> # A tibble: 2 × 4
#>   name  V_shaped straight    p_value
#>   <chr>    <dbl>    <dbl>      <dbl>
#> 1 mpg       16.6     24.6 0.000110  
#> 2 disp     307.     132.  0.00000248

Obviously, this is not a good way to address this problem even for two variables, and it certainly does not scale well. But it does illustrate the intended outcome. Is there a way to do this in one pipeline? My actual use case involves many more variables, so -- ideally -- I'd be able to feed a vector of variable names into the pipe.

CodePudding user response:

Here's one way of doing it in a single pipe -

library(tidyverse)
library(infer)
  
df %>%
  #select the columns you are interested in
  select(mpg, disp, engine) %>%
  #get them in long format
  pivot_longer(cols = -engine) %>%
  #Divide the data in a list of dataframes
  split(.$name) %>%
  #For each dataframe
  map_df(~{
    #Get the mean value for each engine 
    .x %>% 
      group_by(engine) %>% 
      summarise(value = mean(value)) %>%
      #get the data in wide format
      pivot_wider(names_from = engine) %>%
      #Combine it with t.test result
      bind_cols(t_test(.x, formula = value ~ engine, alternative = "two-sided"))
    }, .id = "name")

#   name  `V-shaped` straight statistic  t_df    p_value alternative lower_ci upper_ci
#  <chr>      <dbl>    <dbl>     <dbl> <dbl>      <dbl> <chr>          <dbl>    <dbl>
#1 disp       307.     132.      -5.94  27.0 0.00000248 two.sided    -235.     -114. 
#2 mpg         16.6     24.6      4.67  22.7 0.000110   two.sided       4.42     11.5

Obviously, you can keep only the columns (using select) that is relevant to you.

CodePudding user response:

This can be achieved with a call to summarise() and a flattening of the results. If we use stats::t.test() the group means are returned as part of the test and don't need to be calculated separately. broom::tidy() ensures that each result is returned in a 1-row tibble.

library(dplyr)
library(purrr)
library(broom)

mtcars %>%
  summarise(across(-vs, \(x)
                   list(tidy(
                     t.test(x ~ vs, data = .,  alternative = "two.sided")
                   )))) %>%
  flatten_dfr(.id = "names") %>%
  rename("V-shaped" = estimate1, straight = estimate2)

# A tibble: 10 × 11
   names estimate `V-shaped` straight statistic      p.value parameter conf.low conf.high method                  alternative
   <chr>    <dbl>      <dbl>    <dbl>     <dbl>        <dbl>     <dbl>    <dbl>     <dbl> <chr>                   <chr>      
 1 mpg     -7.94      16.6      24.6     -4.67  0.000110          22.7  -11.5      -4.42  Welch Two Sample t-test two.sided  
 2 cyl      2.87       7.44      4.57     7.79  0.0000000112      29.9    2.12      3.63  Welch Two Sample t-test two.sided  
 3 disp   175.       307.      132.       5.94  0.00000248        27.0  114.      235.    Welch Two Sample t-test two.sided  
 4 hp      98.4      190.       91.4      6.29  0.00000182        23.6   66.1     131.    Welch Two Sample t-test two.sided  
 5 drat    -0.467      3.39      3.86    -2.66  0.0129            27.1   -0.827    -0.107 Welch Two Sample t-test two.sided  
 6 wt       1.08       3.69      2.61     3.76  0.000728          30.0    0.493     1.66  Welch Two Sample t-test two.sided  
 7 qsec    -2.64      16.7      19.3     -5.94  0.00000352        24.6   -3.56     -1.72  Welch Two Sample t-test two.sided  
 8 am      -0.167      0.333     0.5     -0.927 0.362             27.1   -0.535     0.202 Welch Two Sample t-test two.sided  
 9 gear    -0.302      3.56      3.86    -1.22  0.232             28.8   -0.807     0.204 Welch Two Sample t-test two.sided  
10 carb     1.83       3.61      1.79     3.98  0.000413          29.6    0.888     2.76  Welch Two Sample t-test two.sided 
  • Related