I'm trying to loop TukeyHSD tests through each column of a dataframe and compare treatment levels. Here's some mock data that's a simplified version of the data I have (my data has ~350 columns):
df1 <- data.frame(cmpd1 = c(500,436,1,1,1,1),
cmpd2 = c(1,1,1,1,1,253),
cmpd3 = c(1,1,300,57,150,260),
treatment=c("W","W","A","A","D","D"))
I've followed the suggestions in this post successfully and have created a loop that runs ANOVAs for each column, outputting only columns that had a p-value <0.07 for the treatment comparisons:
# specific compound differences
for (i in 1:3){
column <- names(df1[i])
anova <- broom::tidy(aov(df1[,i] ~ treatment, data = df1))
# only want aov with P < 0.07 printed
if(anova$p.value[1] < 0.07) {
print(column)
print(anova)
}
}
However, I'd like to run TukeyHSD tests on all columns in a similar way, only outputting the tukey results that have a p-value <0.07 for any given treatment comparison. I tried something like this but it doesn't work, giving the error "Error in if (tukey[["p adj"]] < 0.07) { : argument is of length zero":
for (i in 1:3){
column <- names(df1[i])
anova <- aov(df1[,i] ~ treatment, data = df1)
tukey <- TukeyHSD(anova)
# only want tukey with P < 0.07 printed
if(tukey[["p adj"]] < 0.07) {
print(column)
print(tukey)
}
}
I can't figure out the right way to have it only output tukey tests that contain a p-value <0.07, so my ideal output would be something like this (this contains made-up values):
$cmpd1
diff lwr upr p adj
D-A 2.728484e-12 -29169.59 29169.59 1.0000000
W-A 3.637979e-12 -32278.10 32278.10 0.0001
W-D 1.484573e 04 -13620.88 43312.34 0.056
CodePudding user response:
The output of TukeyHSD
is a list
as evident from the str
ucture
str(TukeyHSD(aov(df1[,1] ~ treatment, data = df1)))
List of 1
$ treatment: num [1:3, 1:4] -2.84e-14 4.67e 02 4.67e 02 -1.09e 02 3.58e 02 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "D-A" "W-A" "W-D"
.. ..$ : chr [1:4] "diff" "lwr" "upr" "p adj"
- attr(*, "class")= chr [1:2] "TukeyHSD" "multicomp"
- attr(*, "orig.call")= language aov(formula = df1[, 1] ~ treatment, data = df1)
- attr(*, "conf.level")= num 0.95
- attr(*, "ordered")= logi FALSE
we can extract the list element 'treatment' which is a matrix
and thus the [[
or $
wouldn't work. We can use [
with column name along with the ,
to separate the row/column index or names and wrap with any
as there are 3 values for 'p adj' (if
expects a single TRUE/FALSE logical input)
for (i in 1:3){
column <- names(df1[i])
anova <- aov(df1[,i] ~ treatment, data = df1)
tukey <- TukeyHSD(anova)
# only want tukey with P < 0.07 printed
if(any(tukey$treatment[, "p adj"] < 0.07)) {
print(column)
print(setNames(tukey, column))
}
}
-output
[1] "cmpd1"
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = df1[, i] ~ treatment, data = df1)
$cmpd1
diff lwr upr p adj
D-A -2.842171e-14 -109.1823 109.1823 1.000000
W-A 4.670000e 02 357.8177 576.1823 0.000839
W-D 4.670000e 02 357.8177 576.1823 0.000839