How can I add a break on the y axis (hiding 4000 to 6000) to show the variability of the other box plots better?
Is there a way to also add a sing on the y-axis that presents the gap?
ggplot(df, aes(x=reorder(class, percent), y=percent, fill=class))
geom_boxplot()
Here is the data:
df <-
structure(list(Sector = c("coal", "crops", "electricity", "energy intensive industries",
"forestry", "livestock", "oil and gas", "refined oil", "transportation",
"coal", "crops", "electricity", "energy intensive industries",
"forestry", "livestock", "oil and gas", "refined oil", "transportation",
"coal", "crops", "electricity", "energy intensive industries",
"forestry", "livestock", "oil and gas", "refined oil", "transportation",
"coal", "crops", "electricity", "energy intensive industries",
"forestry", "livestock", "oil and gas", "refined oil", "transportation",
"coal", "crops", "electricity", "energy intensive industries",
"forestry", "livestock", "oil and gas", "refined oil", "transportation",
"coal", "crops", "electricity", "energy intensive industries",
"forestry", "livestock", "oil and gas", "refined oil", "transportation"
), percent = c(152.85, 16.53, 31.531, 113.515, 27.303, 82.995,
75.215, 147.322, -0.13, 0.576, 113.84, -1.106, 73.221, 2.333,
1979.688, 95.781, 69.708, -0.871, 96.653, 143.812, 31.335, 80.239,
61.854, 97.244, 243.102, 448.092, -0.05, 96.653, 143.812, 31.386,
68.289, 61.854, 97.244, 2020.017, 322.76, -40.72, 1118.54, 484.989,
58.757, 1203.812, 0.001, 544.68, 3545.212, 6912.517, 0.731, 1449.567,
143.812, 1.086, 495.693, 239.69, 97.244, 456.202, 79.635, -0.083
), class = structure(c(6L, 7L, 2L, 4L, 3L, 5L, 9L, 8L, 1L, 6L,
7L, 2L, 4L, 3L, 5L, 9L, 8L, 1L, 6L, 7L, 2L, 4L, 3L, 5L, 9L, 8L,
1L, 6L, 7L, 2L, 4L, 3L, 5L, 9L, 8L, 1L, 6L, 7L, 2L, 4L, 3L, 5L,
9L, 8L, 1L, 6L, 7L, 2L, 4L, 3L, 5L, 9L, 8L, 1L), .Label = c("transportation",
"electricity", "forestry", "energy intensive industries", "livestock",
"coal", "crops", "refined oil", "oil and gas"), class = "factor")), row.names = c(NA,
-54L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
You have several variants to choose from. The first option is to limit the range of the Y axis. What you are losing is you will not see any outliers, so the loss is small.
df %>% ggplot(aes(x=reorder(class, percent), y=percent, fill=class))
geom_boxplot()
ylim(0,4000)
The second option is to scale the Y axis, e.g. with log10
. Although I agree, it will be a bit hard to read such boxplots on the pre-scaled axis.
df %>% ggplot(aes(x=reorder(class, percent), y=percent, fill=class))
geom_boxplot()
scale_y_continuous(trans = 'log10')
annotation_logticks(sides="l")
The last option is to create your own scaling function. I created a function that scales above 2000.
library(scales)
fspec = function(x) ifelse(x<2000, x, 2000 (x-2000)/10)
fspec_1 = function(x) ifelse(x<2000, x, 2000 (x-2000)*10)
specTrans = trans_new(name = "specialTras",
transform = fspec,
inverse = fspec_1,
breaks = c(0, 1000, 2000, 3000, 4000, 5000, 6000))
df %>% ggplot(aes(x=reorder(class, percent), y=percent, fill=class))
geom_boxplot()
coord_trans(y = specTrans)