I have a vector X with corresponding weights Y. Currently, I am plotting the data with a histogram, and it adds up all the weights within each bin. I would like to instead plot the average weight within each bin, by dividing the value of each bin by the number of data points in that bin for example. Is there a simple way in Python to plot the average weight within each bin, rather than just plotting the the sum of all the weights within each bin?
CodePudding user response:
I ended up using numpy's binned_statistic() to generate means of the data within bins, and did a plt.plot of these means, where each point along the x-axis was the "Average" position of a bin. Here is some code for those curious:
bin_means, bin_edges, binnumber = sp.stats.binned_statistic(X,Y, bins = 50)
#I find the middle point between each set of bin edges
bin_points = (bin_edges[:-1] bin_edges[1:])/2
plt.plot(bin_points,bin_means)
Thanks to @paime for the suggestion! I ended up using binned_statistic instead of their suggestion of calling numpy.histogram twice with the thought that, for very large data sets, generating histograms can take a long time.
CodePudding user response:
If I got you right, then maybe seaborn histplot may help you here. It got an option stat
which influences the y-axis. In your case try stat='probability'
or stat='percent'
to see the portion which the bin in relation to your data has.
Afterwards, for visualizing the y-axis in percent have a look at this post