Home > front end >  adding boxplot outliers after specifying quantiles
adding boxplot outliers after specifying quantiles

Time:10-20

I need to create a custom boxplot in R, which uses the quantiles 0.05, 0.20, 0.50, 0.80 and 0.95 that make up the box and whiskers, rather than the default.

The default plot was generated using this code:

ggplot(data, aes(Site, LOG10Val))  
  geom_boxplot()

and looks like this: enter image description here

To specify the custom bounds of the boxplots, the code I used was:

ggplot(data, aes(Site, LOG10Val))   
  stat_summary(geom = "boxplot", 
               fun.data = function(x) setNames(quantile(x, c(0.05, 0.2, 0.5, 0.8, 0.95)), 
                                               c("ymin", "lower", "middle", "upper", "ymax")), 
               position = "dodge")

the plot becomes:

custom plot

Is there a way to reintroduce the outliers (ie >95th percentile) into the custom boxplot?

Thanks.

Edit: my data structure is as follows:

# A tibble: 6 x 5
  Date       Site  Analyte      Value LOG10Val
  <date>     <fct> <fct>        <dbl>    <dbl>
1 2014-01-10 E     Ammonia_mg.L 0.02     -1.70
2 2014-01-10 C     Ammonia_mg.L 0.01     -2   
3 2014-01-10 D     Ammonia_mg.L 0.015    -1.82
4 2014-01-31 E     Ammonia_mg.L 0.01     -2   
5 2014-01-31 C     Ammonia_mg.L 0.01     -2   
6 2014-01-31 D     Ammonia_mg.L 0.01     -2  

CodePudding user response:

One option to achieve your desired result would be to add the outliers via a second stat_summary layer.

Making use of iris as example data:

library(ggplot2)

ggplot(iris, aes(x = Species, y = Sepal.Length))  
  stat_summary(
    geom = "boxplot",
    fun.data = function(x) {
      setNames(
        quantile(x, c(0.05, 0.2, 0.5, 0.8, 0.95)),
        c("ymin", "lower", "middle", "upper", "ymax")
      )
    }
  )  
  stat_summary(geom = "point", fun = function(x) {
    outlier_high <- x > quantile(x, .95)
    outlier_low <- x < quantile(x, .05)
    ifelse(outlier_high | outlier_low, x, NA)
  }, na.rm = TRUE)

  • Related