Strange numpy divide behaviour for scalars-CodePudding

I have been trying to upgrade a library which has a bunch of geometric operations for scalars so they will work with numpy arrays as well. While doing this I noticed some strange behaviour with numpy divide.

In original code checks a normalised difference between to variables if neither variable is zero, swapping across to numpy this ended up looking something like:

import numpy as np

a = np.array([0, 1, 2, 3, 4])
b = np.array([1, 2, 3, 0, 4])

o = np.zeros(len(a))
o = np.divide(np.subtract(a, b), b, out=o, where=np.logical_and(a != 0, b != 0))
print(f'First implementation: {o}')

where I passed in a output buffer initialised to zero for instances which could not be calculated; this returns:

First implementation: [ 0.         -0.5        -0.33333333  0.          0.        ]

I had to slightly modify this for scalars as out required an array, but it seemed fine.

a = 0
b = 4
o = None if np.isscalar(a) else np.zeros(len(a))
o = np.divide(np.subtract(a, b), b, out=o, where=np.logical_and(b != 0, a != 0))
print(f'Modified out for scalar: {o}')

returns

Modified out for scalar: 0.0.

Then ran this through some test functions and found a lot of them failed. Digging into this, I found that the first time I call the divide with a scalar with where set to False the function returns zero, but if I called it again, the second time it returns something unpredictable.

a = 0
b = 4
print(f'First divide: {np.divide(b, a, where=False)}')
print(f'Second divide: {np.divide(b, a, where=False)}')

returns

First divide: 0.0
Second divide: 4.0

Looking at the documentation, it says "locations within it where the condition is False will remain uninitialized", so I guess numpy as some internal buffer which is initially set to zero then subsequently it ends up carrying over an earlier intermediate value.

I am struggling to see how I can use divide with or without a where clause; if I use where I get an unpredictable output and if I don't I can't protect against divide by zero. Am I missing something or do I just need to have a different code path in these cases? I realise I'm half way to a different code path already with the out variable.

I would be grateful for any advice.

CodePudding user response：

It looks like a bug to me. But I think you'd want to short-circuit the calls to ufuncs in the case of scalars for performance reasons anyway, so its a question of trying to keep it from being too messy. Since either a or b could be scalars, you need to check them both. Put that check into a function that conditionally returns an output array or None, and you could do

def scalar_test_np_zeros(a, b):
    """Return np.zeros for the length of arguments unless both
    arguments are scalar, then None."""
    if a_is:=np.isscalar(a) and np.isscalar(b):
        return None
    else:
        return np.zeros(len(a) if a_is else len(b))

a = 0
b = 4
if o := scalar_test_np_zeros(a, b) is None:
    o = (a-b)/b if a and b else 0.0
else:
    np.divide(np.subtract(a, b), b, out=o, 
        where=np.logical_and(b != 0, a != 0))

The scalar test would be useful in other code with similar problems.

CodePudding user response：

For what it's worth, if I helps anyone I have come to the conclusion I need to wrap np.divide to use it safely in functions which can take arrays and scalars. This is my wrapping function:

import numpy as np

def divide_where(a, b, where, out=None, fill=0):
    """ wraps numpy divide to safely handle where clause for both arrays and scalars
    - a: dividend array or scalar
    - b: divisor array or scalar
    - where: locations where is True a/b will be set
    - out: location where data is written to; if None, an output array will be created using fill value
    - fill: defines fill value. if scalar and where True value will used; if out not set fill value is used creating output array
    """
    if (a_is_scalar := np.isscalar(a)) and np.isscalar(b):
        return fill if not where else a / b
    if out is None:
        out = np.full_like(b if a_is_scalar else a, fill)
    return np.divide(a, b, out=out, where=where)