Home > OS >  How to sort levels of a factor according to the levels of other factors (Nested sorting) for ggplot2
How to sort levels of a factor according to the levels of other factors (Nested sorting) for ggplot2

Time:09-28

I'm struggling with a super simple problem that would take one second to solve in excel, but I can't find a solution in R. I've looked at many other posts and tried different codes, but it still doesn't work.

Here is an example of what my data looks like imagining we are dealing with cars. I have two factors: "brand" and "model" (nested in "brand"); and a "variable".

brand = c("Mercedes","Mercedes","Mercedes","Mercedes","Mercedes",
          "Mercedes","Mercedes","Mercedes","Mercedes","BMW",
          "BMW","BMW","BMW","BMW","BMW","BMW","BMW","BMW")
model = c("SL_class", "SL_class", "SL_class", "A_class", 
          "A_class", "A_class", "E_class", "E_class", 
          "E_class", "4 Series", "4 Series", "4 Series", 
          "X1", "X1", "X1", "Z4", "Z4", "Z4")
variable = c(5,6,7,12,13,14,1,2,3,7,8,9,22,24,25,11,12,14)

data = data.frame(brand, model, variable)
data


data$brand <- factor(data$brand)
data$model <- factor(data$model)

I would like to plot these data in a way that I have x = variable and y = model:

library(tidyverse)

ggplot(data, aes(x = variable, y = model, color = brand))  
  geom_boxplot()

But I would also like "model" to be sorted by "brand" and then by "model", alphabetically. Like this, but without having to specify all the levels of my factor manually. My original dataset is quite big, and I would like to find an automatic way to do it:

data$model <- factor(data$model, 
                     levels = c("4 Series", "X1", "Z4", 
                                "A_class", "E_class", "SL_class"))
data$model = fct_rev(data$model)

ggplot(data, aes(x = variable, y = model, color = brand))  
  geom_boxplot()

In excel, I would just custom sort my data, specifying "brand" as the first level of sorting (from A to Z) and "model" as the second level of sorting (from A to Z)

In addition to this sorting, I would also be able to sort the levels of "model", firstly by "brand" (as before) and then by the medial value of "variable" (Largest to smallest).

I can manage to sort them by the median value of the "variable" (see below), but I can't find a way to order them before by "brand".

data %>% 
  mutate(model = fct_reorder(model, variable, .fun='median')) %>% 
  ggplot(., aes(x = variable, y = model, color = brand))  
  geom_boxplot()

Could someone please help me? Thank you

CodePudding user response:

Does this do what you want?

I think arrange -> fct_inorder will be your best drop-in replacement for that excel functionality.

data %>%
  arrange(desc(as.character(brand)), desc(as.character(model))) %>%
  mutate(model = fct_inorder(model)) %>%
  ggplot(aes(x = variable, y = model, color = brand))  
  geom_boxplot()

or equivalent

data %>%
  arrange(as.character(brand), as.character(model)) %>%
  mutate(model = fct_inorder(model) %>% fct_rev) %>%
  ggplot(aes(x = variable, y = model, color = brand))  
  geom_boxplot() 

enter image description here

Or your other question:

data %>%
  arrange(as.character(brand), variable) %>%
  mutate(model = fct_inorder(model) %>% fct_rev) %>%
  ggplot(aes(x = variable, y = model, color = brand))  
  geom_boxplot() 

enter image description here

  • Related