I really need some input on an R bioinformatics issue. I suspect this might be a case of me not completely understanding the adjustment step when doing a pairwise anova (using this wrapper https://github.com/pmartinezarbizu/pairwiseAdonis). I've been googling around for a while now, and have not found any answer, so next step is trying here.
I've been using pairwise.adonis as a post hoc test after running the adonis test from the package vegan.
I've come across something I find a little strange, and I'm not sure if it's a bug, or if I've done something wrong, or if the data is just weird like that - any inputs would be much appreciated.
When I run the pairwise.adonis like this:
post_hoc_permanova <- pairwise.adonis(t(otu), meta$Fungicide_treatment, sim.function = "vegdist",
sim.method = "bray", p.adjust.m = "fdr", reduce = NULL, perm = 999)
I get this (sorry if the paste is a bit messy, not sure how to make it look pretty here):
post_hoc_permanova
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
Bentonite vs Esquive 1 0.4309946 2.1228348 0.17511043 0.059 0.2400000
Bentonite vs Tessior 1 0.2354402 1.1522300 0.10331835 0.266 0.3855556
Bentonite vs Control water 1 0.2028768 1.0137438 0.09204353 0.346 0.3855556
Bentonite vs Copper oxychloride 1 0.1778650 1.0765919 0.09719523 0.347 0.3855556
Esquive vs Tessior 1 0.4197328 1.8764408 0.15799690 0.068 0.2400000
Esquive vs Control water 1 0.3258340 1.4845894 0.12926796 0.187 0.3740000
Esquive vs Copper oxychloride 1 0.4070595 2.2055388 0.18069983 0.072 0.2400000
Tessior vs Control water 1 0.2583331 1.1700713 0.10475057 0.288 0.3855556
Tessior vs Copper oxychloride 1 0.3041160 1.6361852 0.14061182 0.113 0.2825000
Control water vs Copper oxychloride 1 0.1205611 0.6636605 0.06223571 0.594 0.594000
I get all different p-values, but when they're adjusted, suddenly a lot of them have exactly the same adjusted values (e.g. p.adjusted sig = 0.2400000 for both p.value=0.059, 0.068 and 0.072). I'm at a loss to explain why this happens - if adjusting by any standard, the values would still be slightly different, depending on the "original" p-value, wouldn't they? If anyone can enlighten me, I'm all ears.
My dataset looks like this (full tables available here: https://github.com/Marieag/LeaSyBiome):
head(meta)
SampleID Real_Sample_Name Year Location Cultivar Fungicide_treatment
GF.ITS.VL31 GF.ITS.VL31 <NA> 2022 Lisbon Syrah Bentonite
GF.ITS.VL32 GF.ITS.VL32 <NA> 2022 Lisbon Syrah Bentonite
GF.ITS.VL33 GF.ITS.VL33 <NA> 2022 Lisbon Syrah Bentonite
GF.ITS.VL34 GF.ITS.VL34 <NA> 2022 Lisbon Syrah Bentonite
GF.ITS.VL35 GF.ITS.VL35 <NA> 2022 Lisbon Syrah Bentonite
GF.ITS.VL36 GF.ITS.VL36 <NA> 2022 Lisbon Syrah Bentonite
head(otu)
GF.ITS.VL31 GF.ITS.VL32 GF.ITS.VL33 GF.ITS.VL34 GF.ITS.VL35 GF.ITS.VL36 GF.ITS.VL37 GF.ITS.VL38
g__Kondoa 0.000000000 0.0000000000 0.000000000 0.0000000000 0.0000000000 0 0.000000000 0.0000000000
g__Sarocladium 0.000000000 0.0000000000 0.000000000 0.0000000000 0.0000000000 0 0.000000000 0.0000000000
g__Symmetrospora_1 0.001181818 0.0000000000 0.000000000 0.0000000000 0.0000000000 0 0.000000000 0.0000000000
o__Capnodiales 0.000000000 0.0006363636 0.004272727 0.0006363636 0.0004545455 0 0.001363636 0.0008181818
s__Candida_cretensis 0.000000000 0.0000000000 0.000000000 0.0000000000 0.0000000000 0 0.000000000 0.0000000000
c__Leotiomycetes 0.000000000 0.0000000000 0.001454545 0.0086363636 0.0008181818 0 0.001363636 0.0012727273
GF.ITS.VL39 GF.ITS.VL40 GF.ITS.VL41 GF.ITS.VL42 GF.ITS.VL43 GF.ITS.VL44 GF.ITS.VL45
g__Kondoa 0.000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.000000e 00 0.000000e 00
g__Sarocladium 0.000000000 0.0000000000 0.0000000000 0.0025454545 0.0000000000 0.000000e 00 0.000000e 00
g__Symmetrospora_1 0.000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 9.090909e-05 0.000000e 00
o__Capnodiales 0.000000000 0.0005454545 0.0004545455 0.0002727273 0.0042727273 5.181818e-03 9.090909e-05
s__Candida_cretensis 0.000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.000000e 00 0.000000e 00
c__Leotiomycetes 0.001454545 0.0000000000 0.0000000000 0.0000000000 0.0006363636 1.818182e-04 0.000000e 00
GF.ITS.VL46 GF.ITS.VL47 GF.ITS.VL48 GF.ITS.VL49 GF.ITS.VL50 GF.ITS.VL51 GF.ITS.VL52
g__Kondoa 0.0000000000 0 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0
g__Sarocladium 0.0000000000 0 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0
g__Symmetrospora_1 0.0000000000 0 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0
o__Capnodiales 0.0007272727 0 0.0000000000 0.0003636364 0.0006363636 0.0000000000 0
s__Candida_cretensis 0.0000000000 0 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0
c__Leotiomycetes 0.0000000000 0 0.0007272727 0.0000000000 0.0000000000 0.0001818182 0
GF.ITS.VL53 GF.ITS.VL54 GF.ITS.VL56 GF.ITS.VL57 GF.ITS.VL58 GF.ITS.VL59 GF.ITS.VL60
g__Kondoa 0 0 0.000000000 0 9.090909e-05 0.0000000000 0
g__Sarocladium 0 0 0.000000000 0 0.000000e 00 0.0000000000 0
g__Symmetrospora_1 0 0 0.000000000 0 0.000000e 00 0.0000000000 0
o__Capnodiales 0 0 0.001727273 0 3.636364e-04 0.0000000000 0
s__Candida_cretensis 0 0 0.000000000 0 3.818182e-03 0.0002727273 0
c__Leotiomycetes 0 0 0.000000000 0 0.000000e 00 0.0000000000 0
I hope someone out there can help me. Thanks!
CodePudding user response:
The 'adjusted' values are often the same because of how the Benjamini-Hochberg method works.
The Benjamini-Hochberg adjusted p-values are calculated from the unadjusted p-values as follows:
- First order the p-values from smallest to largest.
- Multiply each by
N/i
, whereN
is the total number andi
is the rank order - Set any values bigger than one to be equal to 1
- Finally make sure the sequence always increases, by setting the adjusted p-value equal the subsequent value if the subsequent value is lower.
So your p-values are:
ps = c(0.059, 0.266, 0.346, 0.347, 0.068, 0.187, 0.072, 0.288, 0.113, 0.594)
if we sort them:
> p2 <- ps[order(ps)]
> p2
[1] 0.059 0.068 0.072 0.113 0.187 0.266 0.288 0.346 0.347 0.594
now make the adjustment:
> pa <- p2 * 10/(1:10)
> pa
[1] 0.5900000 0.3400000 0.2400000 0.2825000 0.3740000 0.4433333 0.4114286 0.4325000 0.3855556 0.5940000
You'll see that the 3rd value is 0.24, so we set the first and the second to 0.24 as well (because they were higher). The ninth value is 0.3856, so we set the 6th to 8th to 0.3856 as well. Then the final sequence will be the BH corrected values:
> p.adjust(p2[order(p2)], method="BH")
[1] 0.2400000 0.2400000 0.2400000 0.2825000 0.3740000 0.3855556 0.3855556 0.3855556 0.3855556 0.5940000
And this is what you see in your table.
The wisdom of using a BH correction for post-hoc ANOVA comparisons is another question altogether!