I have a dataframe with several variables. One of them is continous and the other one is categorical.
I want to obtain wilcoxon test between these two variables, which is basically a metric to compare the difference between two groups of samples. This is really easy when you know which factors you want to compare.
For example, in the Arthritis dataset you have 5 variables (image below)
As we can see, the variable Improved
has three levels marked
, some
and none
.
We would like to see if there are differences between the patients marked
and none
for the variable age
.
This is pretty straightforward, we would just need to create two vectors with the age
of marked
patients and with the age
of none
patients, and then apply wilcox.test.
However, I would like to create a script that allows you to automatically obtain all the p-values
against a chosen factor of the variable.
So, for example, you would choose None
and then iterate over the other factors of the variable and store the p-values.
CodePudding user response:
Probably the easiest way to achieve this would be to just use
stats::pairwise.wilcox.test()
to calculate all pairwise tests for a factor. Then
extract only the results of interest.
multiple_wilcox <- function(response, factor) {
pairwise.wilcox.test(response, factor, p.adjust.method = "none")$p.value[, 1]
}
# By default, tests are found against the reference level
with(iris, multiple_wilcox(Sepal.Length, Species))
#> versicolor virginica
#> 8.345827e-14 6.396699e-17
# ... which can be changed with `relevel()`
with(iris, multiple_wilcox(Sepal.Length, relevel(Species, "virginica")))
#> setosa versicolor
#> 6.396699e-17 5.869006e-07