Home > Back-end >  Is it possible to customize outliers calculation formula in ggplot?
Is it possible to customize outliers calculation formula in ggplot?

Time:01-07

mydata3<-data.frame(x=(c(1:10,20,100,120)))

mydata3%>%ggplot(aes(x=x)) 
  geom_boxplot(
  )

enter image description here

What is the formula of outliers calculation in ggplot ? Quartile1- 1.5*IQR and Quartile3 1.5*IQR ? If so, can I mannualy change this rule ?

CodePudding user response:

Does something like this work?

mydata3<-data.frame(x=(c(1:10,20,100,120)))

mydata3%>%ggplot(aes(x=x))  stat_boxplot(coef = 5)  
  geom_boxplot() 

CodePudding user response:

Yes, your calculation for outliers is correct. From the docs:

The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles).

and

The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge.

To change the positions of the whiskers and hinges, we can use stat = "summary" inside geom_boxplot, and pass our own summary function. This needs to be a function that takes the passed values and returns a data frame with columns ymin, lower, middle, upper and ymax.

In this example, we set the whisker tips at 10th and 90th centile, the hinges at 40th and 60th centile, and the middle line at the 50th centile:

mydata3 <- data.frame(x=(c(1:10,20,100,120)))

mydata3 %>% 
  ggplot(aes(y = 0, x = x))  
  geom_boxplot(stat = "summary",
               fun.data = ~ data.frame(ymin   = quantile(.x, 0.1),
                                       lower  = quantile(.x, 0.4),
                                       middle = quantile(.x, 0.5),
                                       upper  = quantile(.x, 0.6),
                                       ymax   = quantile(.x, 0.9)),
               orientation = "y") 

enter image description here

Note that if we change the quantiles in the above function, we get a different boxplot:

mydata3 %>% 
  ggplot(aes(y = 0, x = x))  
  geom_boxplot(stat = "summary",
               fun.data = ~ data.frame(ymin   = quantile(.x, 0.05),
                                       lower  = quantile(.x, 0.1),
                                       middle = quantile(.x, 0.5),
                                       upper  = quantile(.x, 0.9),
                                       ymax   = quantile(.x, 0.95)),
               orientation = "y") 

enter image description here

  • Related