Home > Software engineering >  T-tests across multiple columns or tidy the data
T-tests across multiple columns or tidy the data

Time:11-14

New to posting to Stack so apologies for any issues.

I'm learning to get more comfortable in R and currently looking at using broom/purr to run multiple stat tests at one time. An example of my current data looks like this:

Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC
1 30 40 6 8 12 10
2 15 12 9 13 7 7
3 20 22 11 12 9 10

But over many subjects and more tests. I want to do a dependent t-test to see scores changed over the course of a training program, but don't want to run a test for each score.

I've seen a couple examples of people using group by, nest, and map to run multiple t-tests, but their data was in a longer format

Is there a way to achieve the same goal while in a wide format? Or will I need to use pivot_longer to change the data.

Thanks in advance!

ETA had an edit here but was giving incorrect results and so have removed Still looking for some help on the arguments and same length

CodePudding user response:

Yes, some pivoting is needed. Asssuming you have no directional hypotheses and you want to do a pre-post assessment for each test, this might be what you are looking for:

df <- as.data.frame(rbind(c(1,  30, 40, 6,  8,  12, 10),
                          c(2,  15, 12, 9,  13, 7,  7),
                          c(3,  20, 22, 11, 12, 9,  10)))

names(df) <- c("Subject",   
               "PrePushup", "PostPushup",   
               "PreRun",    "PostRun",  
               "PreJump",   "PostJump")

df %>% 
  pivot_longer(-Subject, 
               names_to = c("time", "test"), values_to = "score", 
               names_pattern = "(Pre|Post)(.*)") %>% 
  group_by(test) %>% 
  nest() %>% 
  mutate(t_tests = map(data, ~t.test(score ~ time, data = .x, paired = TRUE))) %>% 
  pull(t_tests) %>% 
  purrr::set_names(c("Pushup", "Run", "Jump"))

$Pushup

    Paired t-test

data:  score by time
t = 0.79241, df = 2, p-value = 0.5112
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -13.28958  19.28958
sample estimates:
mean of the differences 
                      3 


$Run

    Paired t-test

data:  score by time
t = 2.6458, df = 2, p-value = 0.1181
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.461250  6.127916
sample estimates:
mean of the differences 
               2.333333 


$Jump

    Paired t-test

data:  score by time
t = -0.37796, df = 2, p-value = 0.7418
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.127916  3.461250
sample estimates:
mean of the differences 
             -0.3333333 
  • Related