I have been trying to upgrade a library which has a bunch of geometric operations for scalars so they will work with numpy arrays as well. While doing this I noticed some strange behaviour with numpy divide.
In original code checks a normalised difference between to variables if neither variable is zero, swapping across to numpy this ended up looking something like:
import numpy as np
a = np.array([0, 1, 2, 3, 4])
b = np.array([1, 2, 3, 0, 4])
o = np.zeros(len(a))
o = np.divide(np.subtract(a, b), b, out=o, where=np.logical_and(a != 0, b != 0))
print(f'First implementation: {o}')
where I passed in a output buffer initialised to zero for instances which could not be calculated; this returns:
First implementation: [ 0. -0.5 -0.33333333 0. 0. ]
I had to slightly modify this for scalars as out
required an array, but it seemed fine.
a = 0
b = 4
o = None if np.isscalar(a) else np.zeros(len(a))
o = np.divide(np.subtract(a, b), b, out=o, where=np.logical_and(b != 0, a != 0))
print(f'Modified out for scalar: {o}')
returns
Modified out for scalar: 0.0.
Then ran this through some test functions and found a lot of them failed. Digging into this, I found that the first time I call the divide with a scalar with where
set to False
the function returns zero, but if I called it again, the second time it returns something unpredictable.
a = 0
b = 4
print(f'First divide: {np.divide(b, a, where=False)}')
print(f'Second divide: {np.divide(b, a, where=False)}')
returns
First divide: 0.0
Second divide: 4.0
Looking at the documentation, it says "locations within it where the condition is False will remain uninitialized", so I guess numpy as some internal buffer which is initially set to zero then subsequently it ends up carrying over an earlier intermediate value.
I am struggling to see how I can use divide
with or without a where
clause; if I use where
I get an unpredictable output and if I don't I can't protect against divide by zero. Am I missing something or do I just need to have a different code path in these cases? I realise I'm half way to a different code path already with the out
variable.
I would be grateful for any advice.
CodePudding user response:
It looks like a bug to me. But I think you'd want to short-circuit the calls to ufuncs in the case of scalars for performance reasons anyway, so its a question of trying to keep it from being too messy. Since either a
or b
could be scalars, you need to check them both. Put that check into a function that conditionally returns an output array or None
, and you could do
def scalar_test_np_zeros(a, b):
"""Return np.zeros for the length of arguments unless both
arguments are scalar, then None."""
if a_is:=np.isscalar(a) and np.isscalar(b):
return None
else:
return np.zeros(len(a) if a_is else len(b))
a = 0
b = 4
if o := scalar_test_np_zeros(a, b) is None:
o = (a-b)/b if a and b else 0.0
else:
np.divide(np.subtract(a, b), b, out=o,
where=np.logical_and(b != 0, a != 0))
The scalar test would be useful in other code with similar problems.
CodePudding user response:
For what it's worth, if I helps anyone I have come to the conclusion I need to wrap np.divide
to use it safely in functions which can take arrays and scalars. This is my wrapping function:
import numpy as np
def divide_where(a, b, where, out=None, fill=0):
""" wraps numpy divide to safely handle where clause for both arrays and scalars
- a: dividend array or scalar
- b: divisor array or scalar
- where: locations where is True a/b will be set
- out: location where data is written to; if None, an output array will be created using fill value
- fill: defines fill value. if scalar and where True value will used; if out not set fill value is used creating output array
"""
if (a_is_scalar := np.isscalar(a)) and np.isscalar(b):
return fill if not where else a / b
if out is None:
out = np.full_like(b if a_is_scalar else a, fill)
return np.divide(a, b, out=out, where=where)