Home > OS >  Question on Python treatment of numpy.int32 vs int
Question on Python treatment of numpy.int32 vs int

Time:11-12

In coding up a simple Fibonacci script, I found some 'odd' behaviour in how Python treats numpy.int32 vs how it treats regular int numbers.

Can anyone help me understand what causes this behaviour?

Using the Fibonacci code as follows, leveraging caching to significantly speed things up;

from functools import lru_cache
import numpy as np

@lru_cache(maxsize=None)
def fibo(n):
    if n <= 1:
        return n
    else:
        return fibo(n-1) fibo(n-2)

If I define a Numpy array of numbers to calculate over (with np.arange), it all works well until n = 47, then things start going haywire. If, on the other hand, I use a regular python list, then the values are all correctly calculated

You should be able to see the difference with the following;

fibo(np.int32(47)), fibo(47)

Which should return (at least it does for me);

(-1323752223, 2971215073)

Obviously, something very wrong has occured with the calculations against the numpy.int32 input. Now, I can get around the issue by simply inserting a 'n = int(n)' line in the fibo function before anything else is evaluated, but I dont understand why this is necessary.

I've also tried np.int(47) instead of np.int32(47), and found that the former works just fine. However, using np.arange to create the array seems to default to np.int32 data type.

I've tried removing the caching (I wouldn't recommend you try - it takes around 2 hours to calculate to n = 47) - and I get the same behaviour, so that is not the cause.

Can anyone shed some insight into this for me?

Thanks

CodePudding user response:

Python's "integers have unlimited precision". This was built into the language so that new users have "one less thing to learn".

Though maybe not in your case, or for anyone using NumPy. That library is designed to make computations as fast as possible. It therefore uses data types that are well supported by the CPU architecture, such as 32-bit and 64-bit integers that neatly fit into a CPU register and have an invariable memory footprint.

But then we're back to dealing with overflow problems like in any other programming language. NumPy does warn about that though:

>>> print(fibo(np.int32(47)))
fib.py:9: RuntimeWarning: overflow encountered in long_scalars
  return fibo(n-1) fibo(n-2)
-1323752223

Here we are using a signed 32-bit integer. The largest positive number it can hold is 231 - 1 = 2147483647. But the 47th Fibonacci number is even larger than that, it's 2971215073 as you calculated. In that case, the 32-bit integer overflows and we end up with -1323752223, which is its two's complement:

>>> 2971215073   1323752223 == 2**32
True

It worked with np.int because that's just an alias of the built-in int, so it returns a Python integer:

>>> np.int is int
True

For more on this, see: What is the difference between native int type and the numpy.int types?

Also note that np.arange for integer arguments returns an integer array of type np.int_ (with a trailing underscore, unlike np.int). That data type is platform-dependent and maps to 32-bit integers on Windows, but 64-bit on Linux.

  • Related