Home > database >  Adding breaks on y-axis in box-plot
Adding breaks on y-axis in box-plot

Time:10-03

How can I add a break on the y axis (hiding 4000 to 6000) to show the variability of the other box plots better?

Is there a way to also add a sing on the y-axis that presents the gap?

ggplot(df,  aes(x=reorder(class, percent), y=percent, fill=class))   
   geom_boxplot()

enter image description here

Here is the data:

df <- 
structure(list(Sector = c("coal", "crops", "electricity", "energy intensive industries", 
"forestry", "livestock", "oil and gas", "refined oil", "transportation", 
"coal", "crops", "electricity", "energy intensive industries", 
"forestry", "livestock", "oil and gas", "refined oil", "transportation", 
"coal", "crops", "electricity", "energy intensive industries", 
"forestry", "livestock", "oil and gas", "refined oil", "transportation", 
"coal", "crops", "electricity", "energy intensive industries", 
"forestry", "livestock", "oil and gas", "refined oil", "transportation", 
"coal", "crops", "electricity", "energy intensive industries", 
"forestry", "livestock", "oil and gas", "refined oil", "transportation", 
"coal", "crops", "electricity", "energy intensive industries", 
"forestry", "livestock", "oil and gas", "refined oil", "transportation"
), percent = c(152.85, 16.53, 31.531, 113.515, 27.303, 82.995, 
75.215, 147.322, -0.13, 0.576, 113.84, -1.106, 73.221, 2.333, 
1979.688, 95.781, 69.708, -0.871, 96.653, 143.812, 31.335, 80.239, 
61.854, 97.244, 243.102, 448.092, -0.05, 96.653, 143.812, 31.386, 
68.289, 61.854, 97.244, 2020.017, 322.76, -40.72, 1118.54, 484.989, 
58.757, 1203.812, 0.001, 544.68, 3545.212, 6912.517, 0.731, 1449.567, 
143.812, 1.086, 495.693, 239.69, 97.244, 456.202, 79.635, -0.083
), class = structure(c(6L, 7L, 2L, 4L, 3L, 5L, 9L, 8L, 1L, 6L, 
7L, 2L, 4L, 3L, 5L, 9L, 8L, 1L, 6L, 7L, 2L, 4L, 3L, 5L, 9L, 8L, 
1L, 6L, 7L, 2L, 4L, 3L, 5L, 9L, 8L, 1L, 6L, 7L, 2L, 4L, 3L, 5L, 
9L, 8L, 1L, 6L, 7L, 2L, 4L, 3L, 5L, 9L, 8L, 1L), .Label = c("transportation", 
"electricity", "forestry", "energy intensive industries", "livestock", 
"coal", "crops", "refined oil", "oil and gas"), class = "factor")), row.names = c(NA, 
-54L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

You have several variants to choose from. The first option is to limit the range of the Y axis. What you are losing is you will not see any outliers, so the loss is small.

df %>% ggplot(aes(x=reorder(class, percent), y=percent, fill=class))   
  geom_boxplot() 
  ylim(0,4000)

enter image description here

The second option is to scale the Y axis, e.g. with log10. Although I agree, it will be a bit hard to read such boxplots on the pre-scaled axis.

df %>% ggplot(aes(x=reorder(class, percent), y=percent, fill=class))   
  geom_boxplot() 
  scale_y_continuous(trans = 'log10') 
  annotation_logticks(sides="l")

enter image description here

The last option is to create your own scaling function. I created a function that scales above 2000.

library(scales)
fspec = function(x) ifelse(x<2000, x, 2000 (x-2000)/10)
fspec_1 = function(x) ifelse(x<2000, x, 2000 (x-2000)*10)

specTrans = trans_new(name = "specialTras",
                      transform = fspec,
                      inverse = fspec_1,
                      breaks = c(0, 1000, 2000, 3000, 4000, 5000, 6000))

df %>% ggplot(aes(x=reorder(class, percent), y=percent, fill=class))   
  geom_boxplot() 
  coord_trans(y = specTrans)

enter image description here

  • Related