I would like to perform a loop in r, where if the p value of a shaprio-wilk test is less than 0.05, It performs a kruskall-wallis test, but if it is greater than 0.05, a one-way ANOVA is performed. Extending this, if the result of the kruskall-wallis or one-way ANOVA p, value i s < 0.05, a post-hoc test is performed. Would this be possible in r?
#Example columns
Label <- c("blue", "red", "green", "blue", "red", "green","blue", "red", "green","blue", "red", "green","blue", "red", "green")
n <- c(10, 223, 12890, 34, 78, 1902, 34, 211, 1007,209, 330, 90446, 801, 1029, 9011)
#make example data frame
data <- data.frame(Label,n)
#shapiro-wilk test for normality
shapiro.test(data$n)
#if p-value for shapiro-wilk is >0.05, perform one-way ANOVA
OneWay <- lm(n ~ Label, data = data)
anova(OneWay)
#if one-way ANOVA is < 0.05, perform post-hoc
library(mosaic)
TukeyHSD(OneWay)
#If shapiro-wilk < 0.05, perform Krushkall-wallis
kruskal <- kruskal.test(data$n, data$Label) #dependent variable followed by the category (predictor) variable
#if krushkall-wallis is < 0.05, perform post-hoc
library(mosaic)
TukeyHSD(kruskal)
CodePudding user response:
One approach
- load libraries and prepare sample data:
library(dplyr)
library(mosaic)
library(tidyr)
df <- data.frame(Label = c("blue", "red", "green", "blue", "red", "green","blue", "red",
"green","blue", "red", "green","blue", "red", "green"),
n = c(10, 223, 12890, 34, 78, 1902, 34, 211, 1007,209, 330,
90446, 801, 1029, 9011)
)
The following exploits the advantage that a dataframe's "cells" cannot only hold atomic values but also lists (which makes the respective column a "list column"). You can thus also store, e.g. an linear model's output in a cell, if you wrap it in a list
. Take care, though, to extract the list item before mutate
ing it (hence the various [[1]]
in below code.
df |>
summarise(is_normal = shapiro.test(n)[["p.value"]] > .5,
lin_mod = list(lm(n ~ Label)),
p_between_groups = ifelse(is_normal,
anova(lin_mod[[1]])[["Pr(>F)"]][1],
kruskal.test(n ~ Label)[["p.value"]]
)
) |>
mutate(test_type = ifelse(is_normal, "anova", "kruskal"),
p_tukey = ifelse(is_normal & test_type == "anova",
list(TukeyHSD(lin_mod[[1]])[["Label"]][, "p adj"]),
NA
)
) |>
unnest_longer(p_tukey)
output:
# A tibble: 3 x 6
p_shapiro lin_mod p_between_groups test_between p_tukey p_tukey_id
<dbl> <list> <dbl> <chr> <dbl> <chr>
1 0.000000459 <lm> 0.00695 kruskal 0.265 green-blue
2 0.000000459 <lm> 0.00695 kruskal 1.00 red-blue
3 0.000000459 <lm> 0.00695 kruskal 0.270 red-green
(note that I forced Tukey's test which wouldn't be triggered by your sample data)
CodePudding user response:
Here you go:
We can do it with a simple if
else
statetment:
The challenge is to access the p-value
of the different tests (all are shown in the code (for me the overall p-value of lm was the most challenging).
Note: You are asking for a TukeyHSD
test if krushkall-wallis is < 0.05
in your code. In this case TukeyHSD is not appropriate as it is performed after ANOVA.
After kruskal wallis test there are other posthoc tests, one of is the Bonferroni (= Dunn Procedure) which can be used in ANOVA and Kruskal Wallis in equally sample sizes.
#Example columns
Label <- c("blue", "red", "green", "blue", "red", "green","blue", "red", "green","blue", "red", "green","blue", "red", "green")
n <- c(10, 223, 12890, 34, 78, 1902, 34, 211, 1007,209, 330, 90446, 801, 1029, 9011)
data <- data.frame(Label,n)
# perform shapiro wilk test
shap <- shapiro.test(data$n)
# 1. if p-value for shapiro-wilk is > 0.05, perform one-way ANOVA
if(shap$p.value > 0.05){
OneWay <- lm(n ~ Label, data = data)
anova(OneWay)
}
###############################################################
# 2. if one-way ANOVA is < 0.05, perform post-hoc
# get overall p-value of model
f <- summary(OneWay)$fstatistic
oneway.pvalue <- unname(pf(f[1],f[2],f[3],lower.tail=F))
oneway.pvalue
# [1] 0.2083276
if(oneway.pvalue < 0.05){
TukeyHSD(OneWay)
}
###############################################################
# 3. If shapiro-wilk < 0.05, perform Krushkall-wallis
if(shap$p.value < 0.05){
kt <- kruskal.test(n ~ Label, data = data )
kt
}
# Kruskal-Wallis rank sum test
# data: n by Label
# Kruskal-Wallis chi-squared = 9.9377, df = 2, p-value = 0.006951
###############################################################
# 4. if krushkall-wallis is < 0.05, perform post-hoc
if(kt$p.value < 0.05){
#install.packages("FSA")
library(FSA)
DT = dunnTest(n ~ Label,
data=data,
method="bh")
DT
}
# DT
# Dunn (1964) Kruskal-Wallis multiple comparison
# p-values adjusted with the Benjamini-Hochberg method.
# Comparison Z P.unadj P.adj
# 1 blue - green -3.114051 0.001845373 0.005536119
# 2 blue - red -1.132382 0.257473719 0.257473719
# 3 green - red 1.981669 0.047516285 0.071274427