I have data that contains a list of over thousand values, such as:
[3.481, 2.413, 4.682,...]
and can plot the histogram of it easily. However, I want to save the relative frequencies (or probability of each value occurring) of those values as a list, and make sure the frequency adds up to one. I tried using
import numpy
numpy.histogram(data, density=None)
But it's not giving me what I want as the sum doesn't equal one. And setting density equal to 'True' will normalize such that the integral over the range is 1, but I want the sum to equal one. Any help would be greatly appreciated, I tried looking all over the place for a simple code.
CodePudding user response:
This should do the job.
def rel_freq(x):
freqs = [(x.count(value) / len(x)) for value in set(x)]
return freqs
However, note that the sum will not come up to exactly 1 but something like 0.9999
and that is because of floating point error.
CodePudding user response:
The np.histogram(x)
function already gives you the counts of occurrences in x
based on the bins. From there you simply divide by x.size
to get the frequencies:
In [1]: import numpy as np
In [2]: x = np.random.random(1000)
In [3]: counts, bin_edges = np.histogram(x, bins=np.linspace(0.0, 1.0, 10))
In [4]: counts
Out[4]: array([101, 111, 119, 110, 119, 106, 120, 106, 108], dtype=int64)
In [5]: counts / x.size
Out[5]: array([0.101, 0.111, 0.119, 0.11 , 0.119, 0.106, 0.12 , 0.106, 0.108])
In [6]: np.sum(counts) / x.size
Out[6]: 1.0
Note, as @prnvbn points out, floating point arithmetic is inexact and you may not always get exactly 1.0 as the result.