Home > Mobile >  Running multiply Anova and/or T-Test on variables in one dataframe in R
Running multiply Anova and/or T-Test on variables in one dataframe in R

Time:07-11

I've the a data frame containing different items (and it's cost) and also it's subsequent groupings. I would like to run an Anova and/or T-Test for each item based on their groupings to see if their mean differs. Anybody knows how to do this in R?

A sample of the dataframe is as follow:

Item Cost Grouping
Book A 7 A
Book A 9 B
Book A 6 A
Book A 7 B
Book B 4 A
Book B 6 B
Book B 5 A
Book B 3 C
Book C 5 C
Book C 4 A
Book C 7 C
Book C 2 B
Book C 2 B
Book D 4 A
Book D 2 C
Book D 9 C
Book D 4 A

The output should be a simple table (or any similar table) as follows

Item P-Value (from ANOVA/t-test) (H0: Mean same for all groupings)
Book A xxx
Book B xxx
Book C xxx
Book D xxx

Thanks in advance!

CodePudding user response:

Instead of dealing with multiple ANOVA, t-tests and worrisome (and potentially questionable) p-values, I would fit a single generalised linear mixed-effect model with group as a random effect. This is easy to do in a fully Bayesian way using rstanarm, which gives full posterior distributions for the means of every item. Instead of worrying about the suitability & interpretability of (multiple) hypothesis tests, we can then compare posterior distributions for the means directly.

library(rstanarm)
model <- stan_glmer(cost ~ 0   item   (1 | group), data = df)

We can summarise the mean posterior distributions by showing the posterior median and 90% posterior uncertainty intervals per item.

library(broom.mixed)
tidy(model, conf.int = TRUE) %>%
    ggplot(aes(y = term))   
    geom_point(aes(x = estimate))   
    geom_linerange(aes(xmin = conf.low, xmax = conf.high))

enter image description here

Or as a table

tidy(mode, conf.int = TRUE)
## A tibble: 4 × 5
#  term       estimate std.error conf.low conf.high
#  <chr>         <dbl>     <dbl>    <dbl>     <dbl>
#1 itemBook A     7.28      1.17     5.09      9.40
#2 itemBook B     4.44      1.16     2.27      6.45
#3 itemBook C     3.88      1.05     1.89      5.75
#4 itemBook D     4.63      1.21     2.41      6.71

Here,

  • estimate is the posterior median,
  • std.error is the posterior MAD, and
  • conf.low and conf.high are the lower and upper bounds of the 90% posterior uncertainty interval.

CodePudding user response:

You could use anova_test from the rstatix package like this:

df <- data.frame(Item = c("Book A", "Book A", "Book A", "Book A", "Book B", "Book B", "Book B", "Book B"),
                 Cost = c(7,9,6,7,4,6,5,3),
                 Grouping = c("A", "B", "A", "B", "A", "B", "A", "C"))

library(dplyr)
library(rstatix)
df %>% 
  group_by(Item) %>%
  anova_test(Cost ~ Grouping)
#> Coefficient covariances computed by hccm()
#> Coefficient covariances computed by hccm()
#> # A tibble: 2 × 8
#>   Item   Effect     DFn   DFd     F     p `p<.05`   ges
#> * <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <chr>   <dbl>
#> 1 Book A Grouping     1     2   1.8 0.312 ""      0.474
#> 2 Book B Grouping     2     1   4.5 0.316 ""      0.9

Created on 2022-07-10 by the reprex package (v2.0.1)

  • Related