for simplicity lets say i have a dataframe with two columns 'annual income' and 'gender'
i want to plot a histogram of the annual income with each bin with a color indicating the fraction of male/female that are in that bin.
import numpy as np
import pandas as pd
df = {'annual_income':np.random.randint(0,10,(1000,)),'gender':np.random.randint(0,2,(1000,))}
df = pd.DataFrame(df)
df.hist(column='annual_income')
so far i can just plot the histogram of annual income, but i want that each bin will be colored by the fraction of male/female that constitute the bin. if there is a more classic way to visualize the same idea i would appreciate that too
CodePudding user response:
Use pd.cut
or value_counts
to bin, then pd.crosstab
as suggested in the comment:
# pass your bins here, you can also pass an array
(pd.crosstab(pd.cut(df['annual_income'], bins=5),
df['gender'], normalize='index')
.plot.bar(stacked=True)
)
Output: