Home > Back-end >  Stacked histogram with percentage annotations
Stacked histogram with percentage annotations

Time:06-08

I have a pandas dataframe that looks like this

    Box Value
0   A   90
1   B   16
2   B   26
3   B   36
4   C   92
5   A   12
6   B   14
7   C   84
8   A   27
9   B   11

I want to create a Histogram(stacked/grouped) in plotly. And I want to display the percentage of total values per box falling into a certain bin. For example, if box A contains 5 values in the bin 0-10 and a total of 20 values in A, then the histogram for box A should display 25%, and so on.

Note: I don't want the percentage based on total values falling into a certain bin for all boxes, rather I want to display percentage per box in a particular bin.

What changes do I need to make here?

px.histogram(sample_data,
                   color="Box",
                   nbins=10,
                   barmode="group",
                   range_x=[0, 100],
                  )

CodePudding user response:

imports

import pandas as pd
import plotly.express as px

data

###dummy data
box = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','B','B','B']
values = [1,1,1,9,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,1]
df = pd.DataFrame({'box':box, 'values':values})
df

enter image description here


data transforms


### bin your data
bins = [0, 10, 25, 50, 100]
df['binned'] = pd.cut(df['values'], bins)
df_group = df.groupby(['box','binned'], as_index=False).count()

### calculate percentages per variable
df_group['percentage'] = df.groupby(['box', 'binned']).size().groupby(level=0).apply(lambda x: 100 * x / float(x.sum())).values
df_group

enter image description here


plot

### plot
px.bar(df_group, x="box", y="values", color="binned",text=df_group['percentage'].apply(lambda x: '{0:1.2f}%'.format(x)))

enter image description here

  • Related