Home > front end >  How to create grouped and stacked bars
How to create grouped and stacked bars

Time:09-23

I have a very huge dataset with a lot of subsidiaries serving three customer groups in various countries, something like this (in reality there are much more subsidiaries and dates):

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'subsidiary': ['EU','EU','EU','EU','EU','EU','EU','EU','EU','US','US','US','US','US','US','US','US','US'],'date': ['2019-03','2019-04', '2019-05','2019-03','2019-04', '2019-05','2019-03','2019-04', '2019-05','2019-03','2019-04', '2019-05','2019-03','2019-04', '2019-05','2019-03','2019-04', '2019-05'],'business': ['RETAIL','RETAIL','RETAIL','CORP','CORP','CORP','PUBLIC','PUBLIC','PUBLIC','RETAIL','RETAIL','RETAIL','CORP','CORP','CORP','PUBLIC','PUBLIC','PUBLIC'],'value': [500.36,600.45,700.55,750.66,950.89,1300.13,100.05,120.00,150.01,800.79,900.55,1000,3500.79,5000.36,4500.25,50.17,75.25,90.33]})
print(df)

I'd like to make an analysis per subsidiary by producing a stacked bar chart. To do this, I started by defining the x-axis to be the unique months and by defining a subset per business type in a country like this:

x=df['date'].drop_duplicates() 
EUCORP = df[(df['subsidiary']=='EU') & (df['business']=='CORP')] 
EURETAIL = df[(df['subsidiary']=='EU') & (df['business']=='RETAIL')] 
EUPUBLIC = df[(df['subsidiary']=='EU') & (df['business']=='PUBLIC')] 

I can then make a bar chart per business type:

plotEUCORP = plt.bar(x=x, height=EUCORP['value'], width=.35)
plotEURETAIL = plt.bar(x=x, height=EURETAIL['value'], width=.35)
plotEUPUBLIC = plt.bar(x=x, height=EUPUBLIC['value'], width=.35)

However, if I try to stack all three together in one chart, I keep failing:

plotEURETAIL = plt.bar(x=x, height=EURETAIL['value'], width=.35)
plotEUCORP = plt.bar(x=x, height=EUCORP['value'], width=.35, bottom=EURETAIL)
plotEUPUBLIC = plt.bar(x=x, height=EUPUBLIC['value'], width=.35, bottom=EURETAIL EUCORP)
plt.show()

I always receive the below error message:


ValueError: Missing category information for StrCategoryConverter; this might be caused by unintendedly mixing categorical and numeric data

ConversionError: Failed to convert value(s) to axis units: subsidiary date business value 0 EU 2019-03 RETAIL 500.36 1 EU 2019-04 RETAIL 600.45 2 EU 2019-05 RETAIL 700.55


I tried converting the months into the dateformat and/or indexing it, but it actually confused me further...

I would really appreciate any help/support on any of the following, as I a already spend a lot of hours to try to figure this out (I am still a python noob, sry):

  1. How can I fix the error to create a stacked bar chart?
  2. Assuming, the error can be fixed, is this the most efficient way to create the bar chart (e.g. do I really need to create three sub-dfs per subsidiary, or is there a more elegant way?)
  3. Would it be possible to code an iteration, that produces a stacked bar chart by country, so that I don't need to create one per subsidiary?

CodePudding user response:

  • As an FYI, stacked bars are not the best option, because they can make it difficult to compare bar values and can easily be misinterpreted. The purpose of a visualization is to present data in an easily understood format; make sure the message is clear. Side-by-side bars are often a better option.
  • Side-by-side stacked bars are a difficult manual process to construct, it's better to use a figure-level method like enter image description here

    Create grouped and stacked bars

    • See enter image description here

      Create only stacked bars

      • As mentioned by enter image description here

  • Related