Home > Enterprise >  Numpy power returns negative value
Numpy power returns negative value

Time:12-24

I want to plot the Poisson distribution and get negative probabilities for lambda >= 9.

This code generates plots for different lambdas:

import numpy as np
import matplotlib.pyplot as plt
from scipy.special import factorial

for lambda_val in range(1, 12, 2):

    plt.figure()
    
    k = np.arange(0,20)
    y = np.power(lambda_val, k)*np.exp(-lambda_val)/factorial(k)

    plt.bar(k, y)
    plt.title('lambda = '   str(lambda_val))
    plt.xlabel('k')
    plt.ylabel('probability')
    plt.ylim([-0.1, 0.4])
    plt.grid()
    plt.show()

Please see these two plots:

Lambda = 5 looks fine in my opinion.

enter image description here

Lambda = 9 not.

enter image description here

I'm quite sure it has something to do with np.power because

np.power(11, 9)

gives me: -1937019605, whereas

11**9

gives me: 2357947691 (same in WolframAlpha).

But if I avoid np.power and use

y = (lambda_val**k)*math.exp(-lambda_val)/factorial(k)

for calculating the probability, I get negative values as well. I am totally confused. Can anybody explain me the effect or what am I doing wrong? Thanks in advance. :)

CodePudding user response:

Your problem is due to 32-bit integer overflows. This happens because Numpy is sometimes compiled with 32-bit integer even though the platform (OS processor) is a 64-bit one. There is an overflow because Numpy automatically transform the unbounded integer of the Python interpreter to the native np.int_ type. You can check if this type is a 64-bit one using np.int_ is np.int64. AFAIK, the default Numpy binary package compiled for Windows available on Python Pip use 32-bit integers and the one of the Linux packages use 64-bit integers (assuming you are on a 64-bit platform).

The issue can be easily reproduced using:

In [546]: np.power(np.int32(11), np.int32(9))
Out[546]: -1937019605

It can also be solved using:

In [547]: np.power(np.int64(11), np.int64(9))
Out[547]: 2357947691

In the second expression, you use k which is of type np.int_ by default and this is certainly why you get the same problem. Hopefully, you can specify to Numpy that the integer should be bigger. Note that Numpy have some implicit rule to avoid overflow but this is hard to avoid them in all case without strongly impacting performance. Here is a fixed formula:

k = np.arange(0, 20, dtype=np.int64)
y = np.power(lambda_val, k) * np.exp(-lambda_val) / factorial(k)

The rule of thumb is to be very careful about implicit conversions when you get unexpected results.

  • Related