Here is a minimal working example of a simple numpy duck array that I've been using for numeric data.
import numpy as np
class DuckArray(np.lib.mixins.NDArrayOperatorsMixin):
def __init__(self, array: np.ndarray):
self.array = array
def __repr__(self):
return f'DuckArray({self.array})'
def __array_ufunc__(self, function, method, *inputs, **kwargs):
# Normalize inputs
inputs = [inp.array if isinstance(inp, type(self)) else inp for inp in inputs]
# Loop through inputs until we find a valid implementation
for inp in inputs:
result = inp.__array_ufunc__(function, method, *inputs, **kwargs)
if result is not NotImplemented:
return type(self)(result)
return NotImplemented
The real version of this class has an implementation of __array_function__
as well, but this question only involves __array_ufunc__
.
As we can see, this implementation works for numeric dtypes.
In [1]: a = DuckArray(np.array([1, 2, 3]))
In [2]: a 2
Out[2]: DuckArray([3 4 5])
In [3]: a == 2
Out[3]: DuckArray([False True False])
But it fails with a numpy.core._exceptions._UFuncNoLoopError
if the array is a string dtype
In [4]: b = DuckArray(np.array(['abc', 'def', 'ghi']))
In [5]: b == 'def'
Traceback (most recent call last):
File "C:\Users\byrdie\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3441, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-6-c5975227701e>", line 1, in <module>
b == 'def'
File "C:\Users\byrdie\AppData\Local\Programs\Python\Python38\lib\site-packages\numpy\lib\mixins.py", line 21, in func
return ufunc(self, other)
File "<ipython-input-2-aced4bbdd318>", line 15, in __array_ufunc__
result = inp.__array_ufunc__(function, method, *inputs, **kwargs)
numpy.core._exceptions._UFuncNoLoopError: ufunc 'equal' did not contain a loop with signature matching types (dtype('<U3'), dtype('<U3')) -> dtype('bool')
Even though the same operation obviously works on the raw array.
In [6]: b.array == 'def'
Out[6]: array([False, True, False])
Which tells me the ufunc loop does exist, but obviously something is going awry.
Does anyone know where I am going wrong?
CodePudding user response:
When you create a numpy string array, each string's dtype
defaults to <Un
where n
is its length
np.array(['abc', 'defg'])[0].dtype
>> dtype('<U3')
np.array(['abc', 'defg'])[1].dtype
>> dtype('<U4')
np.equal
ufunc has no support for comparing <Un
dtypes so you get an error using it to compare two <U3
of 'abc'
and 'def'
.
To fix it, explicitly state dtype
as object
when creating the string array.
DuckArray(np.array(['abc', 'def'], dtype=object)) == 'abc'
>> DuckArray([ True False])