How to create and annotate a stacked proportional bar chart-CodePudding

I'm struggling to create a stacked bar chart derived from value_counts() of a columns from a dataframe.

Assume a dataframe like the following, where responder is not important, but would like to stack the count of [1,2,3,4,5] for all q# columns.

responder, q1, q2, q3, q4, q5
------------------------------
r1, 5, 3, 2, 4, 1
r2, 3, 5, 1, 4, 2
r3, 2, 1, 3, 4, 5
r4, 1, 4, 5, 3, 2
r5, 1, 2, 5, 3, 4
r6, 2, 3, 4, 5, 1
r7, 4, 3, 2, 1, 5

Look something like, except each bar would be labled by q# and it would include 5 sections for count of [1,2,3,4,5] from the data:

Ideally, all bars will be "100%" wide, showing the count as a proportion of the bar. But it's gauranteed that each responder row will have one entry for each, so the percentage is just a bonus if possible.

Any help would be much appreciated, with a slight preference for matplotlib solution.

CodePudding user response：

You can calculate the heights of bars using percentages and obtain the stacked bar plot using ax = percents.T.plot(kind='barh', stacked=True) where percents is a DataFrame with q1,...q5 as columns and 1,...,5 as indices.

>>> percents
         q1        q2        q3        q4        q5
1  0.196873  0.199316  0.206644  0.194919  0.202247
2  0.205357  0.188988  0.205357  0.205357  0.194940
3  0.202265  0.217705  0.184766  0.196089  0.199177
4  0.199494  0.199494  0.190886  0.198481  0.211646
5  0.196137  0.195146  0.211491  0.205052  0.192174

Then you can use ax.patches to add labels for every bar. Labels can be generated from the original counts DataFrame: counts = df.apply(lambda x: x.value_counts())

>>> counts
    q1   q2   q3   q4   q5
1  403  408  423  399  414
2  414  381  414  414  393
3  393  423  359  381  387
4  394  394  377  392  418
5  396  394  427  414  388

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## create some data similar to yours
np.random.seed(42)
categories = ['q1','q2','q3','q4','q5']
df = pd.DataFrame(np.random.randint(1,6,size=(2000, 5)), columns=categories)

## counts will be used for the labels
counts = df.apply(lambda x: x.value_counts())

## percents will be used to determine the height of each bar
percents = counts.div(counts.sum(axis=1), axis=0)

counts_array = counts.values
nrows, ncols = counts_array.shape
indices = [(i,j) for i in range(0,nrows) for j in range(0,ncols)]
percents_array = percents.values

ax = percents.T.plot(kind='barh', stacked=True)
ax.legend(bbox_to_anchor=(1, 1.01), loc='upper right')
for i, p in enumerate(ax.patches):
    ax.annotate(f"({p.get_width():.2f}%)", (p.get_x()   p.get_width() - 0.15, p.get_y() - 0.10), xytext=(5, 10), textcoords='offset points')
    ax.annotate(str(counts_array[indices[i]]), (p.get_x()   p.get_width() - 0.15, p.get_y()   0.10), xytext=(5, 10), textcoords='offset points')
plt.show()

CodePudding user response：

Bar annotations are most easily implemented with

Transposing df with df = pd.DataFrame(data).set_index('responder').T, swaps the index and columns, to produce the following plot. figsize=(12, 10) may need to be adjusted.

DataFrames

df

           q1  q2  q3  q4  q5
responder                    
r1          5   3   2   4   1
r2          3   5   1   4   2
r3          2   1   3   4   5
r4          1   4   5   3   2
r5          1   2   5   3   4
r6          2   3   4   5   1
r7          4   3   2   1   5

per

                 q1        q2        q3        q4        q5
responder                                                  
r1         0.333333  0.200000  0.133333  0.266667  0.066667
r2         0.200000  0.333333  0.066667  0.266667  0.133333
r3         0.133333  0.066667  0.200000  0.266667  0.333333
r4         0.066667  0.266667  0.333333  0.200000  0.133333
r5         0.066667  0.133333  0.333333  0.200000  0.266667
r6         0.133333  0.200000  0.266667  0.333333  0.066667
r7         0.266667  0.200000  0.133333  0.066667  0.333333

Referenced

How to put the legend out of the plot shows various ways to format and move the legend.
Adding value labels on a matplotlib bar chart provides a detailed explanation of .bar_label.
How to add multiple annotations to a barplot
stack bar plot in matplotlib and add label to each section
How to annotate barplot with percent by hue/legend group
How to add percentages on top of bars in seaborn