Home > Software design >  Normalized and percentage plots using matplotlib
Normalized and percentage plots using matplotlib

Time:12-10

The dataset that I'm currently using shows customers and their classes.

Customer     Class
4124          A
4123          A
532           B
4512          A
5325          B
642           C
5345          A

I'm using matplotlib to plot a frequency bar chart:

class_f=df.groupby(['Class']).size().reset_index(name='Frequency').sort_values('Frequency', ascending=False)
plt.bar(class_f['Class'].astype(str), class_f['Frequency'])
plt.show()

but what I would like to visualize the results using a normalized plot and percentage values on the y-axis. For the percentages values I've been trying to use mtick. For normalized plot I've found lot of example with stacked plots and using seaborn. I am wondering how to do the same using matplotlib.

CodePudding user response:

The following code illustrates 3 different plots:

  • matplotlib bar plot created from the aggregated dataframe, ordered descending
  • seaborn histplot with stat='percent' (uses the order in which the classes are encountered)
  • seaborn histplot with multiple='fill' which shows the relative proportion of each class; usually an x-axis from another column is used; if not, a dummy array of zeros can be used to have just one x-position

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import pandas as pd
import numpy as np

df = pd.DataFrame({'Customer': np.random.randint(1000, 10000, 30),
                   'Class': np.random.choice(['A', 'B', 'C'], 30)})
class_f = df.groupby(['Class']).size().reset_index(name='Frequency').sort_values('Frequency', ascending=False)

fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(14, 4))
ax1.bar(class_f['Class'], class_f['Frequency'] / class_f['Frequency'].sum() * 100)
ax1.yaxis.set_major_formatter(PercentFormatter(100, decimals=0))

sns.histplot(data=df, x='Class', stat='percent', ax=ax2)
ax2.yaxis.set_major_formatter(PercentFormatter(100, decimals=0))

sns.histplot(data=df, x=np.zeros(len(df)), stat='percent', hue='Class', multiple='fill', ax=ax3)
ax3.yaxis.set_major_formatter(PercentFormatter(1))
ax3.set_xticks([])

plt.tight_layout()
plt.show()

comparing plt.bar, sns.histplot

  • Related