The dataset that I'm currently using shows customers and their classes.
Customer Class
4124 A
4123 A
532 B
4512 A
5325 B
642 C
5345 A
I'm using matplotlib to plot a frequency bar chart:
class_f=df.groupby(['Class']).size().reset_index(name='Frequency').sort_values('Frequency', ascending=False)
plt.bar(class_f['Class'].astype(str), class_f['Frequency'])
plt.show()
but what I would like to visualize the results using a normalized plot and percentage values on the y-axis. For the percentages values I've been trying to use mtick. For normalized plot I've found lot of example with stacked plots and using seaborn. I am wondering how to do the same using matplotlib.
CodePudding user response:
The following code illustrates 3 different plots:
- matplotlib bar plot created from the aggregated dataframe, ordered descending
- seaborn histplot with
stat='percent'
(uses the order in which the classes are encountered) - seaborn histplot with
multiple='fill'
which shows the relative proportion of each class; usually an x-axis from another column is used; if not, a dummy array of zeros can be used to have just one x-position
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'Customer': np.random.randint(1000, 10000, 30),
'Class': np.random.choice(['A', 'B', 'C'], 30)})
class_f = df.groupby(['Class']).size().reset_index(name='Frequency').sort_values('Frequency', ascending=False)
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(14, 4))
ax1.bar(class_f['Class'], class_f['Frequency'] / class_f['Frequency'].sum() * 100)
ax1.yaxis.set_major_formatter(PercentFormatter(100, decimals=0))
sns.histplot(data=df, x='Class', stat='percent', ax=ax2)
ax2.yaxis.set_major_formatter(PercentFormatter(100, decimals=0))
sns.histplot(data=df, x=np.zeros(len(df)), stat='percent', hue='Class', multiple='fill', ax=ax3)
ax3.yaxis.set_major_formatter(PercentFormatter(1))
ax3.set_xticks([])
plt.tight_layout()
plt.show()