Is it possible to order box plots in R by their variance?
rm(list = ls())
library(datasets)
library(ggplot2)
data(airquality)
airquality$Month <- factor(airquality$Month,
labels = c("May", "Jun", "Jul", "Aug", "Sep"))
p10 <- ggplot(airquality, aes(x = Month, y = Ozone))
geom_boxplot()
p10
How can I make it so that the variable with the most variation would be on the left of this plot?
Any help at all would be greatly appreciated!
CodePudding user response:
With reorder
you could do:
library(ggplot2)
data(airquality)
airquality$Month <- factor(airquality$Month,
labels = c("May", "Jun", "Jul", "Aug", "Sep"))
ggplot(airquality, aes(x = reorder(Month, Ozone, function(x) -var(x, na.rm = TRUE)), y = Ozone))
geom_boxplot()
#> Warning: Removed 37 rows containing non-finite values (stat_boxplot).
CodePudding user response:
Here's a full tidyverse-based reprex
library(tidyverse)
airquality %>%
group_by(Month) %>%
mutate(var = var(Ozone, na.rm = TRUE)) %>%
ungroup() %>%
mutate(Month = fct_reorder(month.abb[Month], -var)) %>%
ggplot(aes(x = Month, y = Ozone))
geom_boxplot()
#> Warning: Removed 37 rows containing non-finite values (stat_boxplot).
Created on 2022-04-18 by the reprex package (v2.0.1)