I'm struggling to create a stacked bar chart derived from value_counts()
of a columns from a dataframe.
Assume a dataframe like the following, where responder
is not important, but would like to stack the count of [1,2,3,4,5]
for all q#
columns.
responder, q1, q2, q3, q4, q5
------------------------------
r1, 5, 3, 2, 4, 1
r2, 3, 5, 1, 4, 2
r3, 2, 1, 3, 4, 5
r4, 1, 4, 5, 3, 2
r5, 1, 2, 5, 3, 4
r6, 2, 3, 4, 5, 1
r7, 4, 3, 2, 1, 5
Look something like, except each bar would be labled by q#
and it would include 5 sections for count of [1,2,3,4,5]
from the data:
Ideally, all bars will be "100%" wide, showing the count as a proportion of the bar. But it's gauranteed that each responder
row will have one entry for each, so the percentage is just a bonus if possible.
Any help would be much appreciated, with a slight preference for matplotlib
solution.
CodePudding user response:
You can calculate the heights of bars using percentages and obtain the stacked bar plot using ax = percents.T.plot(kind='barh', stacked=True)
where percents
is a DataFrame with q1,...q5
as columns and 1,...,5
as indices.
>>> percents
q1 q2 q3 q4 q5
1 0.196873 0.199316 0.206644 0.194919 0.202247
2 0.205357 0.188988 0.205357 0.205357 0.194940
3 0.202265 0.217705 0.184766 0.196089 0.199177
4 0.199494 0.199494 0.190886 0.198481 0.211646
5 0.196137 0.195146 0.211491 0.205052 0.192174
Then you can use ax.patches
to add labels for every bar. Labels can be generated from the original counts DataFrame: counts = df.apply(lambda x: x.value_counts())
>>> counts
q1 q2 q3 q4 q5
1 403 408 423 399 414
2 414 381 414 414 393
3 393 423 359 381 387
4 394 394 377 392 418
5 396 394 427 414 388
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
## create some data similar to yours
np.random.seed(42)
categories = ['q1','q2','q3','q4','q5']
df = pd.DataFrame(np.random.randint(1,6,size=(2000, 5)), columns=categories)
## counts will be used for the labels
counts = df.apply(lambda x: x.value_counts())
## percents will be used to determine the height of each bar
percents = counts.div(counts.sum(axis=1), axis=0)
counts_array = counts.values
nrows, ncols = counts_array.shape
indices = [(i,j) for i in range(0,nrows) for j in range(0,ncols)]
percents_array = percents.values
ax = percents.T.plot(kind='barh', stacked=True)
ax.legend(bbox_to_anchor=(1, 1.01), loc='upper right')
for i, p in enumerate(ax.patches):
ax.annotate(f"({p.get_width():.2f}%)", (p.get_x() p.get_width() - 0.15, p.get_y() - 0.10), xytext=(5, 10), textcoords='offset points')
ax.annotate(str(counts_array[indices[i]]), (p.get_x() p.get_width() - 0.15, p.get_y() 0.10), xytext=(5, 10), textcoords='offset points')
plt.show()
CodePudding user response:
- Bar annotations are most easily implemented with
- Transposing
df
withdf = pd.DataFrame(data).set_index('responder').T
, swaps the index and columns, to produce the following plot.figsize=(12, 10)
may need to be adjusted.
DataFrames
df
q1 q2 q3 q4 q5 responder r1 5 3 2 4 1 r2 3 5 1 4 2 r3 2 1 3 4 5 r4 1 4 5 3 2 r5 1 2 5 3 4 r6 2 3 4 5 1 r7 4 3 2 1 5
per
q1 q2 q3 q4 q5 responder r1 0.333333 0.200000 0.133333 0.266667 0.066667 r2 0.200000 0.333333 0.066667 0.266667 0.133333 r3 0.133333 0.066667 0.200000 0.266667 0.333333 r4 0.066667 0.266667 0.333333 0.200000 0.133333 r5 0.066667 0.133333 0.333333 0.200000 0.266667 r6 0.133333 0.200000 0.266667 0.333333 0.066667 r7 0.266667 0.200000 0.133333 0.066667 0.333333
Referenced
- How to put the legend out of the plot shows various ways to format and move the legend.
- Adding value labels on a matplotlib bar chart provides a detailed explanation of
.bar_label
. - How to add multiple annotations to a barplot
- stack bar plot in matplotlib and add label to each section
- How to annotate barplot with percent by hue/legend group
- How to add percentages on top of bars in seaborn
- Transposing