I have a pandas dataframe that looks like this
Box Value
0 A 90
1 B 16
2 B 26
3 B 36
4 C 92
5 A 12
6 B 14
7 C 84
8 A 27
9 B 11
I want to create a Histogram(stacked/grouped) in plotly
. And I want to display the percentage of total values per box falling into a certain bin. For example, if box A
contains 5
values in the bin 0-10
and a total of 20 values in A
, then the histogram for box A
should display 25%
, and so on.
Note: I don't want the percentage based on total values falling into a certain bin for all boxes, rather I want to display percentage per box in a particular bin.
What changes do I need to make here?
px.histogram(sample_data,
color="Box",
nbins=10,
barmode="group",
range_x=[0, 100],
)
CodePudding user response:
imports
import pandas as pd
import plotly.express as px
data
###dummy data
box = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','B','B','B']
values = [1,1,1,9,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,1]
df = pd.DataFrame({'box':box, 'values':values})
df
data transforms
### bin your data
bins = [0, 10, 25, 50, 100]
df['binned'] = pd.cut(df['values'], bins)
df_group = df.groupby(['box','binned'], as_index=False).count()
### calculate percentages per variable
df_group['percentage'] = df.groupby(['box', 'binned']).size().groupby(level=0).apply(lambda x: 100 * x / float(x.sum())).values
df_group
plot
### plot
px.bar(df_group, x="box", y="values", color="binned",text=df_group['percentage'].apply(lambda x: '{0:1.2f}%'.format(x)))