I am surprised to see that the probability density doesn't sum to 1. Is there a tweak to make it equal to 1?
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
plt.style.use('seaborn-deep')
#input file is a flat file that contains portfolio holdings and characteristics
input_file = r'\\CP\file.xls'
df = pd.read_excel(input_file,header=6)
#number of lines in Fund is 123
df_Fund=df[(df['Port. Weight']>0)]
#number of lines in Bench is 214
df_Bench=df[(df['Bench. Weight']>0)]
#Delta distribution
x = df_Fund['Delta']
y = df_Bench['Delta']
plt.hist([x,y],bins=10, density=True, range=(0,100), label=['Fund','Bench'])
plt.legend(loc='upper right')
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.title='Delta Breakdown'
plt.show()
Graph:
CodePudding user response:
From the documentation
density bool, default: False
If True, draw and return a probability density: each bin will >display the bin's raw count divided by the total number of counts >and the bin width (density = counts / (sum(counts) * >np.diff(bins))), so that the area under the histogram integrates to >1 (np.sum(density * np.diff(bins)) == 1).
If stacked is also True, the sum of the histograms is normalized to 1.
The density is not also weighted by the bin width. As it looks like a binning of approximately 10, I would expect your data to sum to 0.1
instead of 1
.
The way to interpret your graph is "For every x between 50 and 60 the probability is 1.75%"
So in order to "tweak" it to one, you either use a bin size of 1
bins=range(100)
or - as mentioned in the other answers - normalize your probabilities
CodePudding user response:
If you want it to sum to one, then you divide by the total sum.
For example if you are summing up some components and the sum to a number X
x_0 x_1 x_2 ... = X
so if you then it you divide each component by the total you get
(x_0/X) (x_1/X) (x_2/X) ... = (x_0 x_1 x_2...)/X = X/X = 1
which is how you normalise any distribution (if the distribution is continuous then the sum becomes an integral)
hopefully that helps