Home > database >  pandas plot histogram of column with color indicating the fraction of counts belonging to a category
pandas plot histogram of column with color indicating the fraction of counts belonging to a category

Time:03-22

for simplicity lets say i have a dataframe with two columns 'annual income' and 'gender'

i want to plot a histogram of the annual income with each bin with a color indicating the fraction of male/female that are in that bin.

import numpy as np
import pandas as pd
df = {'annual_income':np.random.randint(0,10,(1000,)),'gender':np.random.randint(0,2,(1000,))}
df = pd.DataFrame(df)

df.hist(column='annual_income')

so far i can just plot the histogram of annual income, but i want that each bin will be colored by the fraction of male/female that constitute the bin. if there is a more classic way to visualize the same idea i would appreciate that too

CodePudding user response:

Use pd.cut or value_counts to bin, then pd.crosstab as suggested in the comment:

# pass your bins here, you can also pass an array
(pd.crosstab(pd.cut(df['annual_income'], bins=5),
             df['gender'], normalize='index')
   .plot.bar(stacked=True)
)

Output:

enter image description here

  • Related