I have a massive dataset and am trying to plot a sort of boxplot
with the Q1, Q2, Q3 stats by category. I would like a boxplot visualization with the standard interquartile range box and thicker line outlining the median, but not the whiskers and outliers. I would also like to add the average by category to it.
Because my data is massive it would be easier to compute all of this and then plot the stats as identity
. I found the code below which computes the stats to then plot them. However, it doesn't work when I delete ymin
and ymax
from the code. I would like a similar code that: (i) does not have the max and min, (ii) adds the average as a dot, (iii) computes and plots stats by category.
y <- rnorm(100)
df <- data.frame(
x = 1,
y0 = min(y),
y25 = quantile(y, 0.25),
y50 = median(y),
y75 = quantile(y, 0.75),
y100 = max(y)
)
ggplot(df, aes(x))
geom_boxplot(
aes(ymin = y0, lower = y25, middle = y50, upper = y75, ymax = y100),
stat = "identity"
)
CodePudding user response:
Assuming the category is x
and you calculated the statistics for each category (which i simulate in the example), you can set ymax and ymin to Q1 and Q3 to hide them:
library(ggplot2)
set.seed(1234)
y1 <- rnorm(100)
y2 <- rnorm(100)
df <- data.frame(
x = as.factor(1:2),
y0 = c(min(y1),min(y2)),
y25 = c(quantile(y1, 0.25),quantile(y2, 0.25)),
y50 = c(quantile(y1, 0.5),quantile(y2, 0.5)),
y75 = c(quantile(y1, 0.75),quantile(y2, 0.75)),
y100 = c(max(y1),max(y2)),
mean = c(mean(y1),mean(y2))
)
df$y100<-df$y75
df$y0<-df$y25
ggplot(df, aes(x))
geom_boxplot(
aes(group=x, ymin = y0, lower = y25, middle = y50, upper = y75, ymax = y100),
stat = "identity"
) geom_point(aes(group=x, y=mean))
CodePudding user response:
You can use stat_summary to add mean or other statistics in plot. For example, add stat_summary(fun = "mean", colour = "red", size = 2, geom = "point").