I am looking for a way to use syntax of group_split()
or summarise()
while preserving original group information. I've seen some previous pages like here and here using the approaches but they don't preserve the grouping information. Is there a way to do this? I could of course join data but was hoping to avoid using that approach.
> set.seed(22)
> # Create fake data
> flavor <- data.frame(
temperature = sample(x = c('hot','cold'), size = 500, replace = TRUE),
color = sample(c('red','blue','green'), 500, TRUE),
texture = sample(c('crumbly', 'crispy', 'wet', 'soft'), 500, TRUE),
flavor = sample.int(n = 100, size = 500, replace = TRUE)
)
>
> head(flavor, 10)
temperature color texture flavor
1 cold red soft 47
2 hot red crumbly 2
3 cold blue crispy 28
4 cold blue soft 36
5 cold blue crumbly 69
6 cold red soft 49
7 cold blue soft 100
8 hot blue crumbly 42
9 hot blue soft 93
10 hot green wet 47
Using base split map (works but doesn't preserve original group information)
> flavor %>%
group_by(color, texture) %>%
mutate(subsets = cur_group_id()) %>%
ungroup() %>%
base::split(.$subsets) %>%
purrr::map(~ wilcox.test(flavor ~ temperature, data = .)) %>%
purrr::map_dfr(~ broom::tidy(.))
# A tibble: 12 × 4
statistic p.value method alternative
<dbl> <dbl> <chr> <chr>
1 237 0.687 Wilcoxon rank sum test with continuity correction two.sided
2 152. 0.866 Wilcoxon rank sum test with continuity correction two.sided
3 236. 0.696 Wilcoxon rank sum test with continuity correction two.sided
4 308 0.216 Wilcoxon rank sum test with continuity correction two.sided
5 256 0.281 Wilcoxon rank sum test with continuity correction two.sided
6 122 0.540 Wilcoxon rank sum test with continuity correction two.sided
7 244 0.742 Wilcoxon rank sum test with continuity correction two.sided
8 130. 0.0393 Wilcoxon rank sum test with continuity correction two.sided
9 238. 0.317 Wilcoxon rank sum test with continuity correction two.sided
10 360. 0.345 Wilcoxon rank sum test with continuity correction two.sided
11 75 0.0292 Wilcoxon rank sum test with continuity correction two.sided
12 219 0.149 Wilcoxon rank sum test with continuity correction two.sided
There were 12 warnings (use warnings() to see them)
Using summarise like approach? (preserves group information but the statistic is incorrect)
> flavor %>%
group_by(color, texture) %>%
summarise(output = wilcox.test(flavor ~ temperature, data = .) %>% broom::tidy())
`summarise()` has grouped output by 'color'. You can override using the `.groups` argument.
# A tibble: 12 × 3
# Groups: color [3]
color texture output$statistic $p.value $method $alternative
<chr> <chr> <dbl> <dbl> <chr> <chr>
1 blue crispy 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
2 blue crumbly 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
3 blue soft 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
4 blue wet 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
5 green crispy 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
6 green crumbly 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
7 green soft 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
8 green wet 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
9 red crispy 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
10 red crumbly 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
11 red soft 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
12 red wet 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
Using group_split (same problem as first)
> flavor %>%
group_split(color, texture) %>%
map_dfr(~wilcox.test(flavor ~ temperature, data = .) %>% broom::tidy())
# A tibble: 12 × 4
statistic p.value method alternative
<dbl> <dbl> <chr> <chr>
1 237 0.687 Wilcoxon rank sum test with continuity correction two.sided
2 152. 0.866 Wilcoxon rank sum test with continuity correction two.sided
3 236. 0.696 Wilcoxon rank sum test with continuity correction two.sided
4 308 0.216 Wilcoxon rank sum test with continuity correction two.sided
5 256 0.281 Wilcoxon rank sum test with continuity correction two.sided
6 122 0.540 Wilcoxon rank sum test with continuity correction two.sided
7 244 0.742 Wilcoxon rank sum test with continuity correction two.sided
8 130. 0.0393 Wilcoxon rank sum test with continuity correction two.sided
9 238. 0.317 Wilcoxon rank sum test with continuity correction two.sided
10 360. 0.345 Wilcoxon rank sum test with continuity correction two.sided
11 75 0.0292 Wilcoxon rank sum test with continuity correction two.sided
12 219 0.149 Wilcoxon rank sum test with continuity correction two.sided
CodePudding user response:
You could use the broom
package to get tidy results, aided by a bit of nesting / unnesting
library(tidyverse)
library(broom)
flavor %>%
nest(data = c(-color, -texture)) %>%
mutate(data = map(data, ~ wilcox.test(flavor ~ temperature, data = .x)),
data = map(data, tidy)) %>%
unnest(data)
#> # A tibble: 12 x 6
#> color texture statistic p.value method alter~1
#> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 blue crumbly 157 0.936 Wilcoxon rank sum test with continui~ two.si~
#> 2 red crispy 242. 0.440 Wilcoxon rank sum test with continui~ two.si~
#> 3 red crumbly 137 0.609 Wilcoxon rank sum test with continui~ two.si~
#> 4 blue crispy 409 0.761 Wilcoxon rank sum test with continui~ two.si~
#> 5 green wet 132. 0.248 Wilcoxon rank sum test with continui~ two.si~
#> 6 blue soft 228. 0.454 Wilcoxon rank sum test with continui~ two.si~
#> 7 blue wet 209 0.404 Wilcoxon rank sum test with continui~ two.si~
#> 8 red soft 230. 0.672 Wilcoxon rank sum test with continui~ two.si~
#> 9 green soft 141 0.0808 Wilcoxon rank sum test with continui~ two.si~
#> 10 green crispy 226. 0.178 Wilcoxon rank sum test with continui~ two.si~
#> 11 red wet 146. 0.0301 Wilcoxon rank sum test with continui~ two.si~
#> 12 green crumbly 164. 0.533 Wilcoxon rank sum test with continui~ two.si~
#> # ... with abbreviated variable name 1: alternative
Created on 2022-09-05 with reprex v2.0.2
CodePudding user response:
You can use the rstatix
package which is designed to perform several statistical tests using the tidyverse
.
library(rstatix)
library(tidyverse)
flavor |>
group_by(color, texture) |>
wilcox_test(flavor ~ temperature)
# A tibble: 12 x 9
# color texture .y. group1 group2 n1 n2 statistic p
# * <chr> <chr> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
# 1 blue crispy flavor cold hot 21 21 237 0.687
# 2 blue crumbly flavor cold hot 21 14 152. 0.866
# 3 blue soft flavor cold hot 21 21 236. 0.696
# 4 blue wet flavor cold hot 22 23 308 0.216
# 5 green crispy flavor cold hot 26 24 256 0.281
# 6 green crumbly flavor cold hot 20 14 122 0.54
# 7 green soft flavor cold hot 23 20 244 0.742
# 8 green wet flavor cold hot 20 21 130. 0.0393
# 9 red crispy flavor cold hot 25 23 238. 0.317
#10 red crumbly flavor cold hot 23 27 360. 0.345
#11 red soft flavor cold hot 16 17 75 0.0292
#12 red wet flavor cold hot 18 19 219 0.149