Home > OS >  How to save the relative frequencies of a data set as a list?
How to save the relative frequencies of a data set as a list?

Time:09-29

I have data that contains a list of over thousand values, such as:

[3.481, 2.413, 4.682,...]

and can plot the histogram of it easily. However, I want to save the relative frequencies (or probability of each value occurring) of those values as a list, and make sure the frequency adds up to one. I tried using

import numpy

numpy.histogram(data, density=None)

But it's not giving me what I want as the sum doesn't equal one. And setting density equal to 'True' will normalize such that the integral over the range is 1, but I want the sum to equal one. Any help would be greatly appreciated, I tried looking all over the place for a simple code.

CodePudding user response:

This should do the job.

def rel_freq(x):
    freqs = [(x.count(value) / len(x)) for value in set(x)]
    return freqs

However, note that the sum will not come up to exactly 1 but something like 0.9999 and that is because of floating point error.

CodePudding user response:

The np.histogram(x) function already gives you the counts of occurrences in x based on the bins. From there you simply divide by x.size to get the frequencies:

In [1]: import numpy as np

In [2]: x = np.random.random(1000)

In [3]: counts, bin_edges = np.histogram(x, bins=np.linspace(0.0, 1.0, 10))

In [4]: counts
Out[4]: array([101, 111, 119, 110, 119, 106, 120, 106, 108], dtype=int64)

In [5]: counts / x.size
Out[5]: array([0.101, 0.111, 0.119, 0.11 , 0.119, 0.106, 0.12 , 0.106, 0.108])

In [6]: np.sum(counts) / x.size
Out[6]: 1.0

Note, as @prnvbn points out, floating point arithmetic is inexact and you may not always get exactly 1.0 as the result.

  • Related