How can I add statistical results for my data above a ggplot bar graph?-CodePudding

I am using a Kruskal-Wallis test to determine significant differences between treatments. Since this test is not an ANOVA, I cannot use the traditional post-hoc mean separation letters above the bar graph. There are some groups that are statistically different than others, and I would like to indicate the groups that are statistically different from one another on the graph.

Here is the data I am working with:

example <- data.frame(trt = c(5,
                              1,
                              2,
                              3,
                              4,
                              5,
                              2,
                              1,
                              3,
                              4,
                              1,
                              4,
                              5,
                              3,
                              2,
                              4,
                              1,
                              3,
                              5,
                              2,
                              4,
                              2,
                              1,
                              3,
                              5),
                      result=c(143278.25,
                               143290.75,
                               143275.75,
                               143298.25,
                               143298.25,
                               143285.75,
                               143285.75,
                               143304.5,
                               143304.5,
                               143302,
                               143300.75,
                               143303.25,
                               143298.25,
                               143304.5,
                               143285.75,
                               143304.5,
                               143304.5,
                               143300.75,
                               143302,
                               143288.25,
                               143304.5,
                               143289.5,
                               143304.5,
                               143298.75,
                               143300.75))

First, run the Kruskal-Wallace test:

library(FSA)
library(ggplot2)
library(dplyr)

kruskal.test(result ~ trt, data = example)

Results:

Kruskal-Wallis rank sum test

data:  result by trt
Kruskal-Wallis chi-squared = 13.266, df = 4, p-value = 0.01005

These results indicate significant differences between groups, so the Dunn test will be used as a post-hoc test:

example$trt <- as.factor(example$trt)
dunnTest(result ~ trt, data = example, method = "bonferroni")

Results of this test indicate significance between groups 1-2, 2-3, and 2-4:

   Comparison           Z     P.unadj      P.adj
1       1 - 2  2.87420990 0.004050397 0.04050397
2       1 - 3  0.34838908 0.727548003 1.00000000
3       2 - 3 -2.52582083 0.011542833 0.11542833
4       1 - 4 -0.02177432 0.982627981 1.00000000
5       2 - 4 -2.89598422 0.003779714 0.03779714
6       3 - 4 -0.37016340 0.711260748 1.00000000
7       1 - 5  1.80726835 0.070720448 0.70720448
8       2 - 5 -1.06694156 0.285998228 1.00000000
9       3 - 5  1.45887927 0.144598340 1.00000000
10      4 - 5  1.82904267 0.067393217 0.67393217

Generating the bar graph:

sem <- function(x) sd(x)/sqrt(length(x)) # Function for standard error of the mean

avg_data <- example %>% 
  group_by(trt) %>% 
  summarise(data = mean(result), .groups = 'drop')

sem_data <- example %>% 
  group_by(trt) %>% 
  summarise(data = sem(result), .groups = 'drop')

ggplot(avg_data, aes(x = trt, y = data)) 
  geom_bar(stat="identity", width = 0.2, position = "dodge", col = "black", fill = "blue") 
  geom_errorbar(aes(ymin = data, ymax = data   sem_data$data), width = 0.2, position = position_dodge(0.6)) 
  xlab("Study") ylab("data")

I would like to add the p-values for the significantly differnet groups with brackets above the bars, as illustrated in the picture below:

Is there a way that this can be done?

CodePudding user response：

I think a bar plot is a poor choice here. Because of the y axis scale, the bar heights all look the same. Showing only the mean of a variable using a bar plot also hides useful information about the distribution and sample size that could easily be displayed. This has led many people to advocate