I want to plot the Poisson distribution and get negative probabilities for lambda >= 9.
This code generates plots for different lambdas:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import factorial
for lambda_val in range(1, 12, 2):
plt.figure()
k = np.arange(0,20)
y = np.power(lambda_val, k)*np.exp(-lambda_val)/factorial(k)
plt.bar(k, y)
plt.title('lambda = ' str(lambda_val))
plt.xlabel('k')
plt.ylabel('probability')
plt.ylim([-0.1, 0.4])
plt.grid()
plt.show()
Please see these two plots:
Lambda = 5 looks fine in my opinion.
Lambda = 9 not.
I'm quite sure it has something to do with np.power because
np.power(11, 9)
gives me: -1937019605, whereas
11**9
gives me: 2357947691 (same in WolframAlpha).
But if I avoid np.power and use
y = (lambda_val**k)*math.exp(-lambda_val)/factorial(k)
for calculating the probability, I get negative values as well. I am totally confused. Can anybody explain me the effect or what am I doing wrong? Thanks in advance. :)
CodePudding user response:
Your problem is due to 32-bit integer overflows. This happens because Numpy is sometimes compiled with 32-bit integer even though the platform (OS processor) is a 64-bit one. There is an overflow because Numpy automatically transform the unbounded integer of the Python interpreter to the native np.int_
type. You can check if this type is a 64-bit one using np.int_ is np.int64
. AFAIK, the default Numpy binary package compiled for Windows available on Python Pip use 32-bit integers and the one of the Linux packages use 64-bit integers (assuming you are on a 64-bit platform).
The issue can be easily reproduced using:
In [546]: np.power(np.int32(11), np.int32(9))
Out[546]: -1937019605
It can also be solved using:
In [547]: np.power(np.int64(11), np.int64(9))
Out[547]: 2357947691
In the second expression, you use k
which is of type np.int_
by default and this is certainly why you get the same problem. Hopefully, you can specify to Numpy that the integer should be bigger. Note that Numpy have some implicit rule to avoid overflow but this is hard to avoid them in all case without strongly impacting performance. Here is a fixed formula:
k = np.arange(0, 20, dtype=np.int64)
y = np.power(lambda_val, k) * np.exp(-lambda_val) / factorial(k)
The rule of thumb is to be very careful about implicit conversions when you get unexpected results.