This question is related to this T-tests across multiple columns or tidy the data.
data:
df <- structure(list(Subject = 1:3, PreScoreTestA = c(30L, 15L, 20L
), PostScoreTestA = c(40L, 12L, 22L), PreScoreTestB = c(6L, 9L,
11L), PostScoreTestB = c(8L, 13L, 12L), PreScoreTestC = c(12L,
7L, 9L), PostScoreTestC = c(10L, 7L, 10L)), class = "data.frame", row.names = c(NA,
-3L))
> df
Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC
1 1 30 40 6 8 12 10
2 2 15 12 9 13 7 7
3 3 20 22 11 12 9 10
Here the OP asks if it is possible to apply t.test
to pairs of columns in wide format dataframe. There is a solution already provided using long format.
However I try to apply the following code as an answer to perform the t.test in wide format.
My code using
as function (works well):
library(dplyr)
library(stringr)
df %>%
mutate(across(starts_with('PreScore'), ~ .
get(str_replace(cur_column(), "^PreScore", "PostScore")), .names = "{.col}_TTest")) %>%
rename_at(vars(ends_with('TTest')), ~ str_remove(., "PreScore"))
# gives:
Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC
1 1 30 40 6 8 12 10
2 2 15 12 9 13 7 7
3 3 20 22 11 12 9 10
TestA_TTest TestB_TTest TestC_TTest
1 70 14 22
2 27 22 14
3 42 23 19
Now I thought to change function
by t.test
(which does not work, I tried many variations):
library(dplyr)
library(stringr)
df %>%
mutate(across(starts_with('PreScore'), ~ . t.test
get(str_replace(cur_column(), "^PreScore", "PostScore")), .names = "{.col}_TTest")) %>%
rename_at(vars(ends_with('TTest')), ~ str_remove(., "PreScore"))
I would like to know:
Is it possible to apply t.test
function to sets of predefined column pairs after across
like it is possible for -
/
etc...
Further resources I have been through:
Looping to get t.test result in R using dplyr
dplyr summarise multiple columns using t.test
apply t.test on every consecutive pair of columns of a data.frame
R: t test over multiple columns using t.test function
CodePudding user response:
The t.test
output is a list
, so we may need to wrap in a list
to containerize with mutate
library(dplyr)
library(stringr)
out <- df %>%
mutate(across(starts_with('PreScore'),
~list(t.test(.,
get(str_replace(cur_column(), "^PreScore", "PostScore")))),
.names = "{.col}_TTest")) %>%
rename_at(vars(ends_with('TTest')), ~ str_remove(., "PreScore"))
-check the str
> str(out)
'data.frame': 3 obs. of 10 variables:
$ Subject : int 1 2 3
$ PreScoreTestA : int 30 15 20
$ PostScoreTestA: int 40 12 22
$ PreScoreTestB : int 6 9 11
$ PostScoreTestB: int 8 13 12
$ PreScoreTestC : int 12 7 9
$ PostScoreTestC: int 10 7 10
$ TestA_TTest :List of 3
..$ :List of 10
.. ..$ statistic : Named num -0.322
.. .. ..- attr(*, "names")= chr "t"
.. ..$ parameter : Named num 3.07
.. .. ..- attr(*, "names")= chr "df"
.. ..$ p.value : num 0.768
.. ..$ conf.int : num -32.2 26.2
.. .. ..- attr(*, "conf.level")= num 0.95
.. ..$ estimate : Named num 21.7 24.7
.. .. ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
.. ..$ null.value : Named num 0
.. .. ..- attr(*, "names")= chr "difference in means"
.. ..$ stderr : num 9.3
.. ..$ alternative: chr "two.sided"
.. ..$ method : chr "Welch Two Sample t-test"
.. ..$ data.name : chr "PreScoreTestA and get(str_replace(cur_column(), \"^PreScore\", \"PostScore\"))"
.. ..- attr(*, "class")= chr "htest"
..$ :List of 10
...
If we need to extract only a particular list
element i.e. p.value
df %>%
mutate(across(starts_with('PreScore'),
~ t.test(.,
get(str_replace(cur_column(), "^PreScore", "PostScore")))$p.value,
.names = "{.col}_TTest"))
Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC PreScoreTestA_TTest
1 1 30 40 6 8 12 10 0.767827
2 2 15 12 9 13 7 7 0.767827
3 3 20 22 11 12 9 10 0.767827
PreScoreTestB_TTest PreScoreTestC_TTest
1 0.330604 0.8604162
2 0.330604 0.8604162
3 0.330604 0.8604162
Note that by using mutate
we are storing the same information for all the rows. Instead we may use summarise
df %>%
summarise(across(starts_with('PreScore'), ~ t.test(.,
get(str_replace(cur_column(), "^PreScore", "PostScore")))$p.value,
.names = "{.col}_TTest"))
PreScoreTestA_TTest PreScoreTestB_TTest PreScoreTestC_TTest
1 0.767827 0.330604 0.8604162