Home > Software design >  Ordering a stacked bar graph by second variable changing over time
Ordering a stacked bar graph by second variable changing over time

Time:11-03

I've looked a number of answers (here, here, here) but none of them yield the results that I want. I have a data set of industry volumes over time. I would like the stacked bars ordered each month by volume. This means that each month the stacked bar should have a different order if the relative volumes change within that month.

Below is a truncated sample of the data:

test <- structure(list(Date = structure(c(18506, 18506, 18506, 18506, 
18506, 18506, 18536, 18536, 18536, 18536, 18536, 18536, 18567, 
18567, 18567, 18567, 18567, 18567), class = "Date"), Industry = c("Investment", 
"Telecoms", "Mortgage & Loans", "Banking", "Insurance", "Credit Cards", 
"Telecoms", "Investment", "Mortgage & Loans", "Banking", "Credit Cards", 
"Insurance", "Investment", "Telecoms", "Mortgage & Loans", "Credit Cards", 
"Insurance", "Banking"), volume = c(775349, 811294, 3144684, 
4427814, 7062691, 9377254, 1210194, 1735033, 3539406, 6952688, 
8858649, 9076391, 670934, 869452, 3542294, 5132132, 6953113, 
6954535)), row.names = c(NA, -18L), groups = structure(list(Date = structure(c(18506, 
18536, 18567), class = "Date"), .rows = structure(list(1:6, 7:12, 
    13:18), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
"list"))), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"))

Here is the graph so far:

#A. Library
library(dplyr)
library(ggplot)
library(ggtext)
library(scales)

#B. Graph
graph <- test %>%
    
    ggplot(aes(x=Date))  
    
    ##1. Bar graph
    geom_bar(aes(x=Date, y=volume, fill = Industry), stat="identity")  
    
    ##2. Graph title and Axis labels
    ggtitle(label = "**Volume**",
            subtitle = "By Industry")  
    ylab("Volume (Millions)")   
    xlab("")  
    
    ##3. Scales
    scale_fill_manual(values=c("#e3120b", "#336666", "#FB9851", "#acc8d4", 
                               "#dbcc98", "#36E2BD"))  
    scale_x_date(date_breaks = "month", labels = scales::label_date_short())  
    scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6, 
                                            accuracy = 1))   
    
    #4. Theme
    guides(col = guide_legend(ncol = 2, nrow = 3))  
    theme_minimal()  
    theme(text = element_text(family = "Georgia"),
          panel.border=element_blank(), 
          axis.line=element_line(), 
          plot.title = element_markdown(color="black", size=14, hjust = .5),
          plot.subtitle = element_text(hjust = .5),
          axis.title.x = element_text(size = 9, color = "grey30"), 
          axis.title.y = element_text(size = 9, color = "grey30"), 
          legend.box.background = element_rect(color="black", size=.5),
          legend.title = element_blank(),
          legend.text = element_text(size = 6),
          legend.position = "bottom",
          strip.background = element_rect(linetype="solid",),
          panel.grid.minor.y = element_line(color = NA),
          panel.grid.minor.x = element_line(color = NA),
          plot.caption = ggtext::element_markdown(hjust = 1, size = 7, 
                                                  color = "#7B7D7D"))  

From what I understand, ggplot sorts the stacked bars by the factor order. I tried test %>% arrange(Date, volume) but then got stuck on how I could change the factors by month rather than just the static order of the factors. I could create a separate bar for each month with a separate factor but this gets cumbersome if I want to add multiple years to the graph.

Any help appreciated!

CodePudding user response:

I've taken the liberty to boil your example down to the essential. As per comment, I don't think there is a way around defining the factor levels for each month separately. But you can do this in a function, create a list, and make use of the list character of a ggplot object.

That way is scalable, this means, it will stay the same code no matter how many months you have... :)

library(tidyverse)
library(lubridate)

test <- 
  test %>% 
  ## it's probably not necessary to order the data and 
  ## create the factor levels explicitly, but it gives more control
  arrange(Date) %>%
  mutate(year_mo = fct_inorder(paste(year(Date), month(Date), sep = "_")))

## split the new data by month and create different factor levels
ls_test <- 
  test %>%
  split(., .$year_mo) %>%
  map(function(x) {x$Industry <- fct_reorder(x$Industry, x$volume); x})

## make your geom_col list (geom_col is equivalent to geom_bar(stat= "identity")
ls_p_col <- map(ls_test, function(x){
  geom_col(data = x, mapping = aes(x=year_mo, y=volume, fill = Industry))
})

# Voilà!
ggplot()  
  ls_p_col  
  scale_fill_brewer()  
  scale_x_discrete(limits = unique(test$year_mo)) # to force the correct order of your x

  • Related