Home > Back-end >  How to apply t.test() to multiple pairs of columns after mutate across
How to apply t.test() to multiple pairs of columns after mutate across

Time:11-14

This question is related to this T-tests across multiple columns or tidy the data.

data:

df <- structure(list(Subject = 1:3, PreScoreTestA = c(30L, 15L, 20L
), PostScoreTestA = c(40L, 12L, 22L), PreScoreTestB = c(6L, 9L, 
11L), PostScoreTestB = c(8L, 13L, 12L), PreScoreTestC = c(12L, 
7L, 9L), PostScoreTestC = c(10L, 7L, 10L)), class = "data.frame", row.names = c(NA, 
-3L))

> df
  Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC
1       1            30             40             6              8            12             10
2       2            15             12             9             13             7              7
3       3            20             22            11             12             9             10

Here the OP asks if it is possible to apply t.test to pairs of columns in wide format dataframe. There is a solution already provided using long format.

However I try to apply the following code as an answer to perform the t.test in wide format.

My code using as function (works well):

library(dplyr)
library(stringr)
df %>%
  mutate(across(starts_with('PreScore'), ~ .  
                  get(str_replace(cur_column(), "^PreScore", "PostScore")), .names = "{.col}_TTest")) %>%
  rename_at(vars(ends_with('TTest')), ~ str_remove(., "PreScore"))

# gives:
  Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC
1       1            30             40             6              8            12             10
2       2            15             12             9             13             7              7
3       3            20             22            11             12             9             10
  TestA_TTest TestB_TTest TestC_TTest
1          70          14          22
2          27          22          14
3          42          23          19

Now I thought to change function by t.test(which does not work, I tried many variations):

library(dplyr)
library(stringr)
df %>%
  mutate(across(starts_with('PreScore'), ~ . t.test
                  get(str_replace(cur_column(), "^PreScore", "PostScore")), .names = "{.col}_TTest")) %>%
  rename_at(vars(ends_with('TTest')), ~ str_remove(., "PreScore"))

I would like to know:

Is it possible to apply t.test function to sets of predefined column pairs after across like it is possible for - / etc...

Further resources I have been through:

Looping to get t.test result in R using dplyr

dplyr summarise multiple columns using t.test

apply t.test on every consecutive pair of columns of a data.frame

R: t test over multiple columns using t.test function

CodePudding user response:

The t.test output is a list, so we may need to wrap in a list to containerize with mutate

library(dplyr)
library(stringr)
out <- df %>%
  mutate(across(starts_with('PreScore'), 
    ~list(t.test(.,
         get(str_replace(cur_column(), "^PreScore", "PostScore")))), 
        .names = "{.col}_TTest")) %>%
     rename_at(vars(ends_with('TTest')), ~ str_remove(., "PreScore"))

-check the str

> str(out)
'data.frame':   3 obs. of  10 variables:
 $ Subject       : int  1 2 3
 $ PreScoreTestA : int  30 15 20
 $ PostScoreTestA: int  40 12 22
 $ PreScoreTestB : int  6 9 11
 $ PostScoreTestB: int  8 13 12
 $ PreScoreTestC : int  12 7 9
 $ PostScoreTestC: int  10 7 10
 $ TestA_TTest   :List of 3
  ..$ :List of 10
  .. ..$ statistic  : Named num -0.322
  .. .. ..- attr(*, "names")= chr "t"
  .. ..$ parameter  : Named num 3.07
  .. .. ..- attr(*, "names")= chr "df"
  .. ..$ p.value    : num 0.768
  .. ..$ conf.int   : num  -32.2 26.2
  .. .. ..- attr(*, "conf.level")= num 0.95
  .. ..$ estimate   : Named num  21.7 24.7
  .. .. ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
  .. ..$ null.value : Named num 0
  .. .. ..- attr(*, "names")= chr "difference in means"
  .. ..$ stderr     : num 9.3
  .. ..$ alternative: chr "two.sided"
  .. ..$ method     : chr "Welch Two Sample t-test"
  .. ..$ data.name  : chr "PreScoreTestA and get(str_replace(cur_column(), \"^PreScore\", \"PostScore\"))"
  .. ..- attr(*, "class")= chr "htest"
  ..$ :List of 10
...

If we need to extract only a particular list element i.e. p.value

df %>%
   mutate(across(starts_with('PreScore'),
      ~  t.test(.,
         get(str_replace(cur_column(), "^PreScore", "PostScore")))$p.value, 
     .names = "{.col}_TTest"))
  Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC PreScoreTestA_TTest
1       1            30             40             6              8            12             10            0.767827
2       2            15             12             9             13             7              7            0.767827
3       3            20             22            11             12             9             10            0.767827
  PreScoreTestB_TTest PreScoreTestC_TTest
1            0.330604           0.8604162
2            0.330604           0.8604162
3            0.330604           0.8604162

Note that by using mutate we are storing the same information for all the rows. Instead we may use summarise

df %>%
   summarise(across(starts_with('PreScore'), ~  t.test(.,
         get(str_replace(cur_column(), "^PreScore", "PostScore")))$p.value, 
      .names = "{.col}_TTest"))
PreScoreTestA_TTest PreScoreTestB_TTest PreScoreTestC_TTest
1            0.767827            0.330604           0.8604162
  • Related