Home > Enterprise >  How to create a plot with stacked and labeled line segments
How to create a plot with stacked and labeled line segments

Time:10-10

I want to create sort of Stacked Bar Chart [don't know the proper name]. I hand drew the graph [for years 2016 and 2017] and attached it here.

enter image description here

The code to create the df is below:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = [[2016.0, 0.4862, 0.4115, 0.3905, 0.3483, 0.1196],
 [2017.0, 0.4471, 0.4096, 0.3725, 0.2866, 0.1387],
 [2018.0, 0.4748, 0.4016, 0.3381, 0.2905, 0.2012],
 [2019.0, 0.4705, 0.4247, 0.3857, 0.3333, 0.2457],
 [2020.0, 0.4755, 0.4196, 0.3971, 0.3825, 0.2965]]

cols = ['attribute_time', '100-81 percentile', '80-61 percentile', '60-41 percentile', '40-21 percentile', '20-0 percentile']
df = pd.DataFrame(data, columns=cols)
#set seaborn plotting aesthetics
sns.set(style='white')
#create stacked bar chart
df.set_index('attribute_time').plot(kind='bar', stacked=True)

The data doesn't need to stack on top of each other. The code will create a stacked bar chart, but that's not exactly what needs to be displayed. The percentile needs to have labeled horizontal lines indicating the percentile on the x axis for that year. Does anyone have recommendations on how to achieve this goal? Is it a sort of modified stacked bar chart that needs to be visualized?

CodePudding user response:

My approach to this is to represent the data as a categorical scatter plot (stripplot in Seaborn) using horizontal lines rather than points as markers. You'll have to make some choices about exactly how and where you want to plot things, but this should get you started!

I first modified the data a little bit:

df['attribute_time'] = df['attribute_time'].astype('int') # Just to get rid of the decimals.

df = df.melt(id_vars = ['attribute_time'],
    value_name = 'pct_value',
    var_name = 'pct_range')

Melting the DataFrame takes the wide data and makes it long instead, so the columns are now year, pct_value, and pct_range and there is a row for each data point.

Next is the plotting:

fig, ax = plt.subplots()

sns.stripplot(data = df,
    x = 'attribute_time',
    y = 'pct_value',
    hue = 'pct_range',
    jitter = False,
    marker = '_',
    s = 40,
    linewidth = 3,
    ax = ax)

Instead of labeling each point with the range that it belongs to, I though it would be a lot cleaner to separate them into ranges by color.

The jitter is used when there are lots of points for a given category that might overlap to try and prevent them from touching. In this case, we don't need to worry about that so I turned the jitter off. The marker style is categorical scatter plot with horizontal line markers There's obviously a lot you can tweak to make it look how you want with sizing and labeling and stuff, but hopefully this is at least a good start!

  • Related