Home > front end >  How can you make a python histogram percentage sum to 100%?
How can you make a python histogram percentage sum to 100%?

Time:07-16

I am struggling to make a histogram plot where the total percentage of events sums to 100%. Instead, for this particular example, it sums to approximately 3%. Will anyone be able to show me how I make the percentages of my events sum to 100% for any array used?

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import numpy as np
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
data = np.array([0,9,78,6,44,23,88,77,12,29])
length_of_data = len(data)          # Length of data
bins = int(np.sqrt(length_of_data)) # Choose number of bins
y = data

plt.title('Histogram')
plt.ylabel('Percentage Of Events')
plt.xlabel('bins')
plt.hist(y,bins=bins, density = True)
plt.show()
print(bins)

CodePudding user response:

One way of doing it is to get the bin heights that plt.hist returns, then re-set the patch heights to the normalized height you want. It's not that involved if you know what to do, but not that ideal. Here's your case:

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import numpy as np

plt.gca().yaxis.set_major_formatter(PercentFormatter(100))    # <-- changed here
data = np.array([0,9,78,6,44,23,88,77,12,29])
length_of_data = len(data)          # Length of data
bins = int(np.sqrt(length_of_data)) # Choose number of bins
y = data

plt.title('Histogram')
plt.ylabel('Percentage Of Events')
plt.xlabel('bins')
#### Setting new heights
n, bins, patches = plt.hist(y, bins=bins, density = True, edgecolor='k')
scaled_n = n / n.sum() * 100
for new_height, patch in zip(scaled_n, patches):
    patch.set_height(new_height)
####
# Setting cumulative sum as verification
plt.plot((bins[1:]   bins[:-1])/2, scaled_n.cumsum())
# If you want the cumsum to start from 0, uncomment the line below
#plt.plot(np.concatenate([[0], (bins[1:]   bins[:-1])/2]), np.concatenate([[0], scaled_n.cumsum()]))
plt.ylim(top=110)
plt.show()

This is the resulting picture: enter image description here

As others said, you can use seaborn. Here's how to reproduce my code above. You'd still need to add all the labels and styling you want.

import seaborn as sns
sns.histplot(data, bins=int(np.sqrt(length_of_data)), stat='percent')
sns.histplot(data, bins=int(np.sqrt(length_of_data)), stat='percent', cumulative=True, element='poly', fill=False, color='C1')

This is the resulting picture: enter image description here

  • Related