I'm trying to run a t-test across multiple variables. Say I want to group by am
and then I want to see if the mpg
is statistically different for vs
Here's an old answer using summarize_each
but I'm trying to use across
from the dplyr package.
library(tidyverse)
library(broom)
mtcars %>%
group_by(am) %>%
summarise_each(funs(
t.test(.[vs == 0], .[vs == 1])$p.value,
t.test(.[vs == 0], .[vs == 1])$conf.int[1],
t.test(.[vs == 0], .[vs == 1])$conf.int[2]
),
vars = mpg)
#> Warning: `summarise_each_()` was deprecated in dplyr 0.7.0.
#> Please use `across()` instead.
#> Warning: `funs()` was deprecated in dplyr 0.8.0.
#> Please use a list of either functions or lambdas:
#>
#> # Simple named list:
#> list(mean = mean, median = median)
#>
#> # Auto named with `tibble::lst()`:
#> tibble::lst(mean, median)
#>
#> # Using lambdas
#> list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
#> # A tibble: 2 x 4
#> am `vars_$` `vars_[..2` `vars_[..3`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0.000395 -8.33 -3.05
#> 2 1 0.00459 -14.0 -3.27
## clean names via broom
t.test(mtcars %>% filter(am == 0) %>% filter(vs == 0) %>% pull(mpg), mtcars %>% filter(am == 0) %>% filter(vs == 1)%>% pull(mpg)) %>% broom::tidy()
#> # A tibble: 1 x 10
#> estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -5.69 15.0 20.7 -4.63 0.000395 14.0 -8.33 -3.05
#> # ... with 2 more variables: method <chr>, alternative <chr>
t.test(mtcars %>% filter(am == 1) %>% filter(vs == 0) %>% pull(mpg), mtcars %>% filter(am == 1) %>% filter(vs == 1) %>% pull(mpg)) %>% broom::tidy()
#> # A tibble: 1 x 10
#> estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -8.62 19.8 28.4 -3.55 0.00459 11.0 -14.0 -3.27
#> # ... with 2 more variables: method <chr>, alternative <chr>
## how to pass functions into .fns??
mtcars %>%
group_by(am) %>%
summarise(across(
.cols = mpg,
.fns = list(
t.test(.[vs == 0], .[vs == 1])$p.value,
t.test(.[vs == 0], .[vs == 1])$conf.int[1],
t.test(.[vs == 0], .[vs == 1])$conf.int[2]
)
))
#> Error: Problem with `summarise()` input `..1`.
#> i `..1 = across(...)`.
#> x Must subset columns with a valid subscript vector.
#> i Logical subscripts must match the size of the indexed input.
#> x Input has size 11 but subscript `i` has size 19.
#> i The error occurred in group 1: am = 0.
Created on 2021-09-23 by the reprex package (v2.0.1)
CodePudding user response:
If we are using tidy
library(dplyr)
library(broom)
library(tidyr)
mtcars %>%
group_by(am) %>%
summarise(across(
.cols = mpg,
~ list(tidy(t.test(.[vs == 0], .[vs == 1])) %>%
select(p.value, conf.low, conf.high))
)) %>%
unnest(mpg)
-output
# A tibble: 2 x 4
am p.value conf.low conf.high
<dbl> <dbl> <dbl> <dbl>
1 0 0.000395 -8.33 -3.05
2 1 0.00459 -14.0 -3.27
In the OP's code, we need the lambda function inside the list
mtcars %>%
group_by(am) %>%
summarise(across(
.cols = mpg,
.fns = list(
p.value = ~ t.test(.[vs == 0], .[vs == 1])$p.value,
conf.low = ~ t.test(.[vs == 0], .[vs == 1])$conf.int[1],
conf.high =~ t.test(.[vs == 0], .[vs == 1])$conf.int[2]
)
))
-output
# A tibble: 2 x 4
am mpg_p.value mpg_conf.low mpg_conf.high
<dbl> <dbl> <dbl> <dbl>
1 0 0.000395 -8.33 -3.05
2 1 0.00459 -14.0 -3.27