Home > database >  Tidy way to calculate wilcoxon test on multiple group splits and preserve original group information
Tidy way to calculate wilcoxon test on multiple group splits and preserve original group information

Time:09-06

I am looking for a way to use syntax of group_split() or summarise() while preserving original group information. I've seen some previous pages like here and here using the approaches but they don't preserve the grouping information. Is there a way to do this? I could of course join data but was hoping to avoid using that approach.

> set.seed(22)
> # Create fake data
> flavor <- data.frame(
    temperature = sample(x = c('hot','cold'), size = 500, replace = TRUE),
    color = sample(c('red','blue','green'), 500, TRUE),
    texture = sample(c('crumbly', 'crispy', 'wet', 'soft'), 500, TRUE),
    flavor = sample.int(n = 100, size = 500, replace = TRUE)
  )
> 
> head(flavor, 10)
   temperature color texture flavor
1         cold   red    soft     47
2          hot   red crumbly      2
3         cold  blue  crispy     28
4         cold  blue    soft     36
5         cold  blue crumbly     69
6         cold   red    soft     49
7         cold  blue    soft    100
8          hot  blue crumbly     42
9          hot  blue    soft     93
10         hot green     wet     47

Using base split map (works but doesn't preserve original group information)

> flavor %>%
    group_by(color, texture) %>%
    mutate(subsets = cur_group_id()) %>%
    ungroup() %>%
    base::split(.$subsets) %>%
    purrr::map(~ wilcox.test(flavor ~ temperature, data = .)) %>%
    purrr::map_dfr(~ broom::tidy(.))
# A tibble: 12 × 4
   statistic p.value method                                            alternative
       <dbl>   <dbl> <chr>                                             <chr>      
 1      237   0.687  Wilcoxon rank sum test with continuity correction two.sided  
 2      152.  0.866  Wilcoxon rank sum test with continuity correction two.sided  
 3      236.  0.696  Wilcoxon rank sum test with continuity correction two.sided  
 4      308   0.216  Wilcoxon rank sum test with continuity correction two.sided  
 5      256   0.281  Wilcoxon rank sum test with continuity correction two.sided  
 6      122   0.540  Wilcoxon rank sum test with continuity correction two.sided  
 7      244   0.742  Wilcoxon rank sum test with continuity correction two.sided  
 8      130.  0.0393 Wilcoxon rank sum test with continuity correction two.sided  
 9      238.  0.317  Wilcoxon rank sum test with continuity correction two.sided  
10      360.  0.345  Wilcoxon rank sum test with continuity correction two.sided  
11       75   0.0292 Wilcoxon rank sum test with continuity correction two.sided  
12      219   0.149  Wilcoxon rank sum test with continuity correction two.sided  
There were 12 warnings (use warnings() to see them)

Using summarise like approach? (preserves group information but the statistic is incorrect)

> flavor %>%
    group_by(color, texture) %>%
    summarise(output = wilcox.test(flavor ~ temperature, data = .) %>% broom::tidy())
`summarise()` has grouped output by 'color'. You can override using the `.groups` argument.
# A tibble: 12 × 3
# Groups:   color [3]
   color texture output$statistic $p.value $method                                           $alternative
   <chr> <chr>              <dbl>    <dbl> <chr>                                             <chr>       
 1 blue  crispy            30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 2 blue  crumbly           30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 3 blue  soft              30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 4 blue  wet               30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 5 green crispy            30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 6 green crumbly           30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 7 green soft              30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 8 green wet               30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 9 red   crispy            30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
10 red   crumbly           30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
11 red   soft              30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
12 red   wet               30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   

Using group_split (same problem as first)

> flavor %>%
    group_split(color, texture) %>%
    map_dfr(~wilcox.test(flavor ~ temperature, data = .) %>% broom::tidy())
# A tibble: 12 × 4
   statistic p.value method                                            alternative
       <dbl>   <dbl> <chr>                                             <chr>      
 1      237   0.687  Wilcoxon rank sum test with continuity correction two.sided  
 2      152.  0.866  Wilcoxon rank sum test with continuity correction two.sided  
 3      236.  0.696  Wilcoxon rank sum test with continuity correction two.sided  
 4      308   0.216  Wilcoxon rank sum test with continuity correction two.sided  
 5      256   0.281  Wilcoxon rank sum test with continuity correction two.sided  
 6      122   0.540  Wilcoxon rank sum test with continuity correction two.sided  
 7      244   0.742  Wilcoxon rank sum test with continuity correction two.sided  
 8      130.  0.0393 Wilcoxon rank sum test with continuity correction two.sided  
 9      238.  0.317  Wilcoxon rank sum test with continuity correction two.sided  
10      360.  0.345  Wilcoxon rank sum test with continuity correction two.sided  
11       75   0.0292 Wilcoxon rank sum test with continuity correction two.sided  
12      219   0.149  Wilcoxon rank sum test with continuity correction two.sided  

CodePudding user response:

You could use the broom package to get tidy results, aided by a bit of nesting / unnesting

library(tidyverse)
library(broom)

flavor %>%
  nest(data = c(-color, -texture)) %>%
  mutate(data = map(data, ~ wilcox.test(flavor ~ temperature, data = .x)),
         data = map(data, tidy)) %>% 
  unnest(data)
#> # A tibble: 12 x 6
#>    color texture statistic p.value method                                alter~1
#>    <chr> <chr>       <dbl>   <dbl> <chr>                                 <chr>  
#>  1 blue  crumbly      157   0.936  Wilcoxon rank sum test with continui~ two.si~
#>  2 red   crispy       242.  0.440  Wilcoxon rank sum test with continui~ two.si~
#>  3 red   crumbly      137   0.609  Wilcoxon rank sum test with continui~ two.si~
#>  4 blue  crispy       409   0.761  Wilcoxon rank sum test with continui~ two.si~
#>  5 green wet          132.  0.248  Wilcoxon rank sum test with continui~ two.si~
#>  6 blue  soft         228.  0.454  Wilcoxon rank sum test with continui~ two.si~
#>  7 blue  wet          209   0.404  Wilcoxon rank sum test with continui~ two.si~
#>  8 red   soft         230.  0.672  Wilcoxon rank sum test with continui~ two.si~
#>  9 green soft         141   0.0808 Wilcoxon rank sum test with continui~ two.si~
#> 10 green crispy       226.  0.178  Wilcoxon rank sum test with continui~ two.si~
#> 11 red   wet          146.  0.0301 Wilcoxon rank sum test with continui~ two.si~
#> 12 green crumbly      164.  0.533  Wilcoxon rank sum test with continui~ two.si~
#> # ... with abbreviated variable name 1: alternative

Created on 2022-09-05 with reprex v2.0.2

CodePudding user response:

You can use the rstatix package which is designed to perform several statistical tests using the tidyverse.

library(rstatix)
library(tidyverse)

flavor |>
  group_by(color, texture) |>
  wilcox_test(flavor ~ temperature)

# A tibble: 12 x 9
#   color texture .y.    group1 group2    n1    n2 statistic      p
# * <chr> <chr>   <chr>  <chr>  <chr>  <int> <int>     <dbl>  <dbl>
# 1 blue  crispy  flavor cold   hot       21    21      237  0.687 
# 2 blue  crumbly flavor cold   hot       21    14      152. 0.866 
# 3 blue  soft    flavor cold   hot       21    21      236. 0.696 
# 4 blue  wet     flavor cold   hot       22    23      308  0.216 
# 5 green crispy  flavor cold   hot       26    24      256  0.281 
# 6 green crumbly flavor cold   hot       20    14      122  0.54  
# 7 green soft    flavor cold   hot       23    20      244  0.742 
# 8 green wet     flavor cold   hot       20    21      130. 0.0393
# 9 red   crispy  flavor cold   hot       25    23      238. 0.317 
#10 red   crumbly flavor cold   hot       23    27      360. 0.345 
#11 red   soft    flavor cold   hot       16    17       75  0.0292
#12 red   wet     flavor cold   hot       18    19      219  0.149
  • Related