Obtain statistics iteratively in R-CodePudding

I have a dataframe with several variables. One of them is continous and the other one is categorical.

I want to obtain wilcoxon test between these two variables, which is basically a metric to compare the difference between two groups of samples. This is really easy when you know which factors you want to compare.

For example, in the Arthritis dataset you have 5 variables (image below)

As we can see, the variable Improved has three levels marked, some and none. We would like to see if there are differences between the patients marked and none for the variable age.

This is pretty straightforward, we would just need to create two vectors with the age of marked patients and with the ageof none patients, and then apply wilcox.test.

However, I would like to create a script that allows you to automatically obtain all the p-values against a chosen factor of the variable.

So, for example, you would choose None and then iterate over the other factors of the variable and store the p-values.

CodePudding user response：

Probably the easiest way to achieve this would be to just use stats::pairwise.wilcox.test() to calculate all pairwise tests for a factor. Then extract only the results of interest.

multiple_wilcox <- function(response, factor) {
  pairwise.wilcox.test(response, factor, p.adjust.method = "none")$p.value[, 1]
}

# By default, tests are found against the reference level
with(iris, multiple_wilcox(Sepal.Length, Species))
#>   versicolor    virginica 
#> 8.345827e-14 6.396699e-17

# ... which can be changed with `relevel()`
with(iris, multiple_wilcox(Sepal.Length, relevel(Species, "virginica")))
#>       setosa   versicolor 
#> 6.396699e-17 5.869006e-07