While I was trying to perform a paired Wilcoxon test for the variable:"Metabolite", I noticed different P-values between wilcox_test() & wilcox.test(), when I tested the following variable:
structure(list(Visit = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("BSL", "LV"), class = "factor"), Metabolite = c(NA,
9.602, 9.0102, 4.524, 3.75, NA, 6.596, 7.065, 6.877, NA, NA,
10.1846, 13.521, 7.8219, NA, 4.9149, 4.0754, 4.7635, 8.8554,
4.3442, NA, 16.659, NA, 3.698, 6.623, 5.158, 11.719, 3.206, NA,
2.225, 7.417, 1.42, NA, NA, 2.752, 6.504, 7.594, 6.652, NA, NA,
3.784, 2.7311, 4.1749, 2.6659, 0.5592, NA, 4.2326, 4.3808, 3.624,
4.29, 7.098, 6.532, 3.699, 9.297, 8.275, NA)), class = "data.frame", row.names = c(NA,
-56L))
# The P-value derived from result 1 (p=0.0079) is different from that from result 2 (p=0.003279):
result1 <- wilcox_test(data=Data_pairs, Metabolite~Visit, paired = TRUE)
# A tibble: 1 x 7
#.y. group1 group2 n1 n2 statistic p
#* <chr> <chr> <chr> <int> <int> <dbl> <dbl>
# 1 Metabolite BSL LV 28 28 131 0.0079
result2 <- wilcox.test(data=Data_pairs, Metabolite~Visit, paired = TRUE)
# Wilcoxon signed rank exact test
#data: Metabolite by Visit
#V = 197, p-value = 0.003279
#alternative hypothesis: true location shift is not equal to 0
Using different statistical software, it seems that the wrong P-value is that derived from result2.
Is there any suggestion/advice on how to correct my code, and what is the reason for this difference?
Thank you in advance for your help.
CodePudding user response:
If it is indeed paired data, then you need to ensure you provide the complete pairs, taking your data, we can place them side by side, assuming the your observations 1 to 28 and from the same sample as 29 to 56:
da = do.call(cbind,split(Data_pairs$Metabolite,Data_pairs$Visit))
da
BSL LV
[1,] NA NA
[2,] 9.6020 2.2250
[3,] 9.0102 7.4170
[4,] 4.5240 1.4200
[5,] 3.7500 NA
[6,] NA NA
[7,] 6.5960 2.7520
[8,] 7.0650 6.5040
[9,] 6.8770 7.5940
[10,] NA 6.6520
[11,] NA NA
[12,] 10.1846 NA
[13,] 13.5210 3.7840
[14,] 7.8219 2.7311
[15,] NA 4.1749
[16,] 4.9149 2.6659
[17,] 4.0754 0.5592
[18,] 4.7635 NA
[19,] 8.8554 4.2326
[20,] 4.3442 4.3808
[21,] NA 3.6240
[22,] 16.6590 4.2900
[23,] NA 7.0980
[24,] 3.6980 6.5320
[25,] 6.6230 3.6990
[26,] 5.1580 9.2970
[27,] 11.7190 8.2750
[28,] 3.2060 NA
We test only the complete:
wilcox.test(da[complete.cases(da),1],da[complete.cases(da),2],paired=TRUE)
data: da[complete.cases(da), 1] and da[complete.cases(da), 2]
V = 131, p-value = 0.007904
alternative hypothesis: true location shift is not equal to 0
CodePudding user response:
There’s a few things going on here. The most important is you’re not doing a paired test with coin::wilcox_test
(assuming you're using the coin
package - we can't be sure from your question). There is no paired argument for the function. If we compare coin::wilcox_test
with the exact distribution and the unpaired wilcox.test
, we’ll get the same result. There will also be the same result with the argument paired = TRUE
supplied to coin::wilcox_test
.
coin::wilcox_test(data = Data_pairs, Metabolite ~ Visit, distribution = "exact")
#>
#> Exact Wilcoxon-Mann-Whitney Test
#>
#> data: Metabolite by Visit (BSL, LV)
#> Z = 2.503, p-value = 0.01167
#> alternative hypothesis: true mu is not equal to 0
wilcox.test(data = Data_pairs, Metabolite ~ Visit)
#>
#> Wilcoxon rank sum exact test
#>
#> data: Metabolite by Visit
#> W = 320, p-value = 0.01167
#> alternative hypothesis: true location shift is not equal to 0
coin::wilcox_test(data = Data_pairs, Metabolite ~ Visit, paired = TRUE, distribution = "exact")
#>
#> Exact Wilcoxon-Mann-Whitney Test
#>
#> data: Metabolite by Visit (BSL, LV)
#> Z = 2.503, p-value = 0.01167
#> alternative hypothesis: true mu is not equal to 0
However, the p-values with default options will not be the same between the functions regardless. The p-values derived from coin::wilcox_test
are asymptotic. The p-values from wilcox.test
are exact p-values. You can get exact p-values from coin::wilcox_test
by specifying the distribution = "exact"
as I did above. Comparing the default options, you’ll see they differ.
coin::wilcox_test(data = Data_pairs, Metabolite ~ Visit)
#>
#> Asymptotic Wilcoxon-Mann-Whitney Test
#>
#> data: Metabolite by Visit (BSL, LV)
#> Z = 2.503, p-value = 0.01231
#> alternative hypothesis: true mu is not equal to 0
wilcox.test(data = Data_pairs, Metabolite ~ Visit)
#>
#> Wilcoxon rank sum exact test
#>
#> data: Metabolite by Visit
#> W = 320, p-value = 0.01167
#> alternative hypothesis: true location shift is not equal to 0
I would spend some time looking through the functions with ?coin::wilcox_test
or ?wilcox.test
and making sure you understand what test you want to apply and what results they supply. StupidWolf's answer is great advice for doing a paired test.
CodePudding user response:
To maintain reproducibility you should always use set.seed(1)
in your R code. Maybe this is the case?