Home > Software engineering >  Looping TukeyHSD function through multiple columns of a dataframe
Looping TukeyHSD function through multiple columns of a dataframe

Time:09-17

I'm trying to loop TukeyHSD tests through each column of a dataframe and compare treatment levels. Here's some mock data that's a simplified version of the data I have (my data has ~350 columns):

df1 <- data.frame(cmpd1 = c(500,436,1,1,1,1),
                  cmpd2 = c(1,1,1,1,1,253),
                  cmpd3 = c(1,1,300,57,150,260),
                  treatment=c("W","W","A","A","D","D"))

I've followed the suggestions in this post successfully and have created a loop that runs ANOVAs for each column, outputting only columns that had a p-value <0.07 for the treatment comparisons:

# specific compound differences
for (i in 1:3){
  column <- names(df1[i])
  anova <- broom::tidy(aov(df1[,i] ~ treatment, data = df1))
  
  # only want aov with P < 0.07 printed
  if(anova$p.value[1] < 0.07) {
    
    print(column)
    print(anova)
  }
}

However, I'd like to run TukeyHSD tests on all columns in a similar way, only outputting the tukey results that have a p-value <0.07 for any given treatment comparison. I tried something like this but it doesn't work, giving the error "Error in if (tukey[["p adj"]] < 0.07) { : argument is of length zero":

for (i in 1:3){
  column <- names(df1[i])
  anova <- aov(df1[,i] ~ treatment, data = df1)
  tukey <- TukeyHSD(anova)
  
  # only want tukey with P < 0.07 printed
  if(tukey[["p adj"]] < 0.07) {
    
    print(column)
    print(tukey)
  }
}

I can't figure out the right way to have it only output tukey tests that contain a p-value <0.07, so my ideal output would be something like this (this contains made-up values):

$cmpd1
                    diff       lwr      upr     p adj
D-A          2.728484e-12 -29169.59 29169.59 1.0000000
W-A          3.637979e-12 -32278.10 32278.10 0.0001
W-D          1.484573e 04 -13620.88 43312.34 0.056

CodePudding user response:

The output of TukeyHSD is a list as evident from the structure

str(TukeyHSD(aov(df1[,1] ~ treatment, data = df1)))
List of 1
 $ treatment: num [1:3, 1:4] -2.84e-14 4.67e 02 4.67e 02 -1.09e 02 3.58e 02 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "D-A" "W-A" "W-D"
  .. ..$ : chr [1:4] "diff" "lwr" "upr" "p adj"
 - attr(*, "class")= chr [1:2] "TukeyHSD" "multicomp"
 - attr(*, "orig.call")= language aov(formula = df1[, 1] ~ treatment, data = df1)
 - attr(*, "conf.level")= num 0.95
 - attr(*, "ordered")= logi FALSE

we can extract the list element 'treatment' which is a matrix and thus the [[ or $ wouldn't work. We can use [ with column name along with the , to separate the row/column index or names and wrap with any as there are 3 values for 'p adj' (if expects a single TRUE/FALSE logical input)

for (i in 1:3){
  column <- names(df1[i])
  anova <- aov(df1[,i] ~ treatment, data = df1)
  tukey <- TukeyHSD(anova)
  
  # only want tukey with P < 0.07 printed
  if(any(tukey$treatment[, "p adj"] < 0.07)) {
    
    print(column)
     print(setNames(tukey, column))

  }
}

-output

[1] "cmpd1"
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = df1[, i] ~ treatment, data = df1)

$cmpd1
             diff       lwr      upr    p adj
D-A -2.842171e-14 -109.1823 109.1823 1.000000
W-A  4.670000e 02  357.8177 576.1823 0.000839
W-D  4.670000e 02  357.8177 576.1823 0.000839
  • Related