Home > Software design >  Nest or group a dataframe, perform t tests (t.test) between all comparisons and return the p values
Nest or group a dataframe, perform t tests (t.test) between all comparisons and return the p values

Time:11-23

Let's say I have a data frame in which the grouped variable is quest. I would like to nest my df using this variable and then perform a t.test for all possible comparisons and get the p values. The closest approach that I found is this one here

The long way should be as follows

df <- data.frame(quest = c("2","4","6"), 
                 item_1 = rnorm(12,5,1),
                 item_2 = rnorm(12,5,1),
                 item_3 = rnorm(12,5,1))
df %>% 
  filter(quest == 2) %>%
  summarise(p=t.test(item_1, item_2, paired=T)$p.value)

df %>% 
  filter(quest == 2) %>%
  summarise(p=t.test(item_1, item_3, paired=T)$p.value)


df %>% 
  filter(quest == 2) %>%
  summarise(p=t.test(item_2, item_3, paired=T)$p.value)

After this, I'll have to use quest==6 .

I have a gut felling that nest will resolve this issues. Something like that:

df %>% 
  nest_by(quest) %>% 
  all t tests here, with p values %>% 
  unnest()

-I would like to use all tidyverse functions. Thank you

CodePudding user response:

We may use combn for pairwise combination

library(dplyr)
df %>%
    nest_by(quest) %>% 
    summarise(categ = combn(names(data), 2, paste, collapse="_"), 
       pval = combn(data, 2, function(x) 
          t.test(x[[1]], x[[2]], paired = TRUE)$p.value), .groups = 'drop')

-output

# A tibble: 9 × 3
  quest categ          pval
  <chr> <chr>         <dbl>
1 2     item_1_item_2 0.581
2 2     item_1_item_3 0.416
3 2     item_2_item_3 0.263
4 4     item_1_item_2 0.886
5 4     item_1_item_3 0.101
6 4     item_2_item_3 0.106
7 6     item_1_item_2 0.426
8 6     item_1_item_3 0.983
9 6     item_2_item_3 0.362

Or reshape to 'long' format with pivot_longer and use pairwise.t.test

library(tidyr)
df %>% 
   pivot_longer(cols = -quest) %>%
   group_by(quest) %>% 
   summarise(pout = list(broom::tidy(pairwise.t.test(value, name, 
        p.adjust.method = "none", paired = TRUE)))) %>% 
   unnest(pout)

-output

# A tibble: 9 × 4
  quest group1 group2 p.value
  <chr> <chr>  <chr>    <dbl>
1 2     item_2 item_1   0.581
2 2     item_3 item_1   0.416
3 2     item_3 item_2   0.263
4 4     item_2 item_1   0.886
5 4     item_3 item_1   0.101
6 4     item_3 item_2   0.106
7 6     item_2 item_1   0.426
8 6     item_3 item_1   0.983
9 6     item_3 item_2   0.362

NOTE: here we used the p.adjust.method as "none" to match the combn output.

  • Related