I would like to produce a boxplot with wild outliers marked as a unique icon, say the asterisk #8. "Wild outliers" are defined as individual points that are more than Q3 3 * IQR or less than Q1 - 3 * IQR for the data set.
I have seen answers for people who want to label their outliers with their value (e.g. Labeling Outliers of Boxplots in R), and geom_boxplot() has built in a way to modify the style of all outliers. I haven't found any way to modify only some of the outlier points.
For this MRE, I would want to be able to have those wild outliers for 8 cylinders marked with an asterisk while the other outliers are marked with the usual filled in dot.
library(ggplot2)
ggplot(data = mtcars,aes(x=cyl,y=drat,group=cyl))
geom_boxplot()
Here is a way to get the "outer fences" beyond which points are considered wild outliers:
mtcars%>%group_by(cyl)%>%summarize(lf=quantile(drat,probs=.25)-3*IQR(drat),uf=quantile(drat,probs=.75) 3*IQR(drat))
Thank you!
CodePudding user response:
One option would to create two separate dataframe containing the wild and the other outliers and add them to your boxplot via two geom_point
.
library(ggplot2)
library(dplyr, warn = FALSE)
wild_outliers <- mtcars %>%
group_by(cyl) %>%
filter(drat < quantile(drat, probs = .25) - 3 * IQR(drat) |
drat > quantile(drat, probs = .75) 3 * IQR(drat))
outliers <- mtcars %>%
group_by(cyl) %>%
filter(drat < quantile(drat, probs = .25) - 1.5 * IQR(drat) |
drat > quantile(drat, probs = .75) 1.5 * IQR(drat)) |>
anti_join(wild_outliers)
#> Joining, by = c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am",
#> "gear", "carb")
ggplot(data = mtcars, aes(x = cyl, y = drat, group = cyl))
geom_boxplot(outlier.colour = NA)
geom_point(data = outliers, shape = 16, size = 2)
geom_point(data = wild_outliers, shape = "*", size = 8)