Home > Net >  How can I store float probabilities to a file so exactly that they sum up to 1?
How can I store float probabilities to a file so exactly that they sum up to 1?

Time:01-06

I want to store a numpy array to a file. This array contains thousands of float probabilities which all sum up to 1. But when I store the array to a CSV file and load it back, I realise that the numbers have been approximated, and their sum is now some 0.9999 value. How can I fix it?

(Numpy's random choice method requires probabilities to sum up to 1)

CodePudding user response:

Try using np.savetxt.

import numpy as np

arr = np.random.random(1000)
arr /= arr.sum()
np.savetxt('arr.csv', arr, delimiter=',')

arr = np.loadtxt('arr.csv')
print(arr.sum())
# >>> 1.0

CodePudding user response:

Due to floating point arithmetic errors, you can get tiny errors in what seem like ordinary calculations. However, in order to use the choice function, the probabilities don't need to be perfect.

On reviewing the code in the current version of Numpy as obtained from Github, I see that the tolerance for the sum of probabilities is that sum(p) is within sqrt(eps) of 1, where eps is the double precision floating point epsilon, which is approximately 1e-16. So the tolerance is about 1e-8. (See lines 955 and 973 in numpy/random/mtrand.pyx.)

Farther down in mtrand.pyx, choice normalizes the probabilities (which are already almost normalized) to sum to 1; see line 1017.

My advice is to ensure that all 16 digits are stored in the csv, then when you read them back, the error in the sum will be much smaller than 1e-8 and choice will be happy. I think other people commenting here have posted some advice about how to print all digits.

  • Related