Home > Net >  Different P-values between wilcox_test() & wilcox.test() in R
Different P-values between wilcox_test() & wilcox.test() in R

Time:12-04

While I was trying to perform a paired Wilcoxon test for the variable:"Metabolite", I noticed different P-values between wilcox_test() & wilcox.test(), when I tested the following variable:

structure(list(Visit = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), .Label = c("BSL", "LV"), class = "factor"), Metabolite = c(NA, 
9.602, 9.0102, 4.524, 3.75, NA, 6.596, 7.065, 6.877, NA, NA, 
10.1846, 13.521, 7.8219, NA, 4.9149, 4.0754, 4.7635, 8.8554, 
4.3442, NA, 16.659, NA, 3.698, 6.623, 5.158, 11.719, 3.206, NA, 
2.225, 7.417, 1.42, NA, NA, 2.752, 6.504, 7.594, 6.652, NA, NA, 
3.784, 2.7311, 4.1749, 2.6659, 0.5592, NA, 4.2326, 4.3808, 3.624, 
4.29, 7.098, 6.532, 3.699, 9.297, 8.275, NA)), class = "data.frame", row.names = c(NA, 
-56L))



# The P-value derived from result 1 (p=0.0079) is different from that from result 2 (p=0.003279):


        result1 <- wilcox_test(data=Data_pairs,  Metabolite~Visit, paired = TRUE)
        
        # A tibble: 1 x 7
        #.y.     group1 group2    n1    n2 statistic      p
        #* <chr>   <chr>  <chr>  <int> <int>     <dbl>  <dbl>
        #        1 Metabolite BSL    LV        28    28       131 0.0079
        
        
        result2 <- wilcox.test(data=Data_pairs,  Metabolite~Visit, paired = TRUE)
        #   Wilcoxon signed rank exact test
        
        #data:  Metabolite by Visit
        #V = 197, p-value = 0.003279
        #alternative hypothesis: true location shift is not equal to 0
 

Using different statistical software, it seems that the wrong P-value is that derived from result2.

Is there any suggestion/advice on how to correct my code, and what is the reason for this difference?

Thank you in advance for your help.

CodePudding user response:

If it is indeed paired data, then you need to ensure you provide the complete pairs, taking your data, we can place them side by side, assuming the your observations 1 to 28 and from the same sample as 29 to 56:

da = do.call(cbind,split(Data_pairs$Metabolite,Data_pairs$Visit))
da

          BSL     LV
 [1,]      NA     NA
 [2,]  9.6020 2.2250
 [3,]  9.0102 7.4170
 [4,]  4.5240 1.4200
 [5,]  3.7500     NA
 [6,]      NA     NA
 [7,]  6.5960 2.7520
 [8,]  7.0650 6.5040
 [9,]  6.8770 7.5940
[10,]      NA 6.6520
[11,]      NA     NA
[12,] 10.1846     NA
[13,] 13.5210 3.7840
[14,]  7.8219 2.7311
[15,]      NA 4.1749
[16,]  4.9149 2.6659
[17,]  4.0754 0.5592
[18,]  4.7635     NA
[19,]  8.8554 4.2326
[20,]  4.3442 4.3808
[21,]      NA 3.6240
[22,] 16.6590 4.2900
[23,]      NA 7.0980
[24,]  3.6980 6.5320
[25,]  6.6230 3.6990
[26,]  5.1580 9.2970
[27,] 11.7190 8.2750
[28,]  3.2060     NA

We test only the complete:

wilcox.test(da[complete.cases(da),1],da[complete.cases(da),2],paired=TRUE)

data:  da[complete.cases(da), 1] and da[complete.cases(da), 2]
V = 131, p-value = 0.007904
alternative hypothesis: true location shift is not equal to 0

CodePudding user response:

There’s a few things going on here. The most important is you’re not doing a paired test with coin::wilcox_test (assuming you're using the coin package - we can't be sure from your question). There is no paired argument for the function. If we compare coin::wilcox_test with the exact distribution and the unpaired wilcox.test, we’ll get the same result. There will also be the same result with the argument paired = TRUE supplied to coin::wilcox_test.

coin::wilcox_test(data = Data_pairs,  Metabolite ~ Visit, distribution = "exact")
#> 
#>  Exact Wilcoxon-Mann-Whitney Test
#> 
#> data:  Metabolite by Visit (BSL, LV)
#> Z = 2.503, p-value = 0.01167
#> alternative hypothesis: true mu is not equal to 0
wilcox.test(data = Data_pairs, Metabolite ~ Visit)
#> 
#>  Wilcoxon rank sum exact test
#> 
#> data:  Metabolite by Visit
#> W = 320, p-value = 0.01167
#> alternative hypothesis: true location shift is not equal to 0
coin::wilcox_test(data = Data_pairs,  Metabolite ~ Visit, paired = TRUE, distribution = "exact")
#> 
#>  Exact Wilcoxon-Mann-Whitney Test
#> 
#> data:  Metabolite by Visit (BSL, LV)
#> Z = 2.503, p-value = 0.01167
#> alternative hypothesis: true mu is not equal to 0

However, the p-values with default options will not be the same between the functions regardless. The p-values derived from coin::wilcox_test are asymptotic. The p-values from wilcox.test are exact p-values. You can get exact p-values from coin::wilcox_test by specifying the distribution = "exact" as I did above. Comparing the default options, you’ll see they differ.

coin::wilcox_test(data = Data_pairs,  Metabolite ~ Visit)
#> 
#>  Asymptotic Wilcoxon-Mann-Whitney Test
#> 
#> data:  Metabolite by Visit (BSL, LV)
#> Z = 2.503, p-value = 0.01231
#> alternative hypothesis: true mu is not equal to 0
wilcox.test(data = Data_pairs,  Metabolite ~ Visit)
#> 
#>  Wilcoxon rank sum exact test
#> 
#> data:  Metabolite by Visit
#> W = 320, p-value = 0.01167
#> alternative hypothesis: true location shift is not equal to 0

I would spend some time looking through the functions with ?coin::wilcox_test or ?wilcox.test and making sure you understand what test you want to apply and what results they supply. StupidWolf's answer is great advice for doing a paired test.

CodePudding user response:

To maintain reproducibility you should always use set.seed(1) in your R code. Maybe this is the case?

  • Related