I am testing the Pipe library for Python, which tries to simplify the processing of iterables, essentially allowing one to write
ys = xs | f1 | f2 | f3
instead of
ys = f3(f2(f1(xs))
The Pipe
class itself is very short: just call the wrapped function when encountering a __ror__
operation:
import functools
class Pipe:
def __init__(self, function):
self.function = function
functools.update_wrapper(self, function)
def __ror__(self, other):
return self.function(other)
def __call__(self, *args, **kwargs):
return Pipe(lambda x: self.function(x, *args, **kwargs))
However, I found a particular case where the output is not as expected:
import numpy as np
from pipe import Pipe
y1 = np.array([1, 2, 3]) | Pipe(np.median)
y2 = [1, 2, 3] | Pipe(np.median)
print(y1) # [1.0 2.0 3.0]
print(y2) # 2.0
In both cases, I'd assume that np.median
is applied on the iterable, but when the input is a numpy array, the function is mapped over the array, instead of making the array the actual argument (as is the case when using a plain list). That is, in the __ror__
method, the other
argument is an item from the array, as opposed to the actual array itself. I don't see how this can happen. Is it some numpy quirk?
Here's the full reproducible example.
I didn't find any related issue in the project's repo.
CodePudding user response:
"I don't see how this can happen. Is it some numpy quirk?"
NumPy has broadcasting. With the array np.array([1, 2, 3])
on the left, the __or__
method of the NumPy array treats the object Pipe(np.median)
that it gets as its other
argument as a scalar, and broadcasts it so that it acts like [Pipe(np.median), Pipe(np.median), Pipe(np.median)]
(i.e. so the shape of other
matches the shape of self
).
Because Pipe(np.median)
is not a type known to NumPy, the other
argument is treated as an object
array, and the __or__
operation will attempt to operate elementwise on the inputs. That means that eventually, the __ror__
method of Pipe(np.median)
is called, but it happens for each pair in np.array([1, 2, 3])
and [Pipe(np.median), Pipe(np.median), Pipe(np.median)]
. That's why you get the result applied to each element, rather than np.median
applied to the array on the left as a whole. That is also why the data type of the result is object
rather than a NumPy floating point type:
In [19]: np.array([1, 2, 3]) | Pipe(np.median)
Out[19]: array([1.0, 2.0, 3.0], dtype=object)
Regarding the handling of classes that implement __or__
, the author of Pipe says "That's the very limit of Pipe
, there's no clean way to overcome this." See also Pipe does not work if left side object defines __or__
.
For what its worth...
A modification of the code given in the question that would allow Pipe
to work with a NumPy array (or any other object that implements __or__
) is to provide an option to put a thin wrapper around data before it is sent into the pipe. If the __ror__
method of Pipe
sees this wrapper, it unwraps the data, calls its function, and rewraps the result before returning it. A final step at the end of the pipe is needed to unwrap data that was wrapped. Perhaps something like this:
import functools
class Wrap:
def __init__(self, obj):
self._object = obj
def unwrap(self):
return self._object
class Pipe:
def __init__(self, function):
self.function = function
functools.update_wrapper(self, function)
def __ror__(self, other):
if isinstance(other, Wrap):
return Wrap(self.function(other.unwrap()))
else:
return self.function(other)
def __call__(self, *args, **kwargs):
return Pipe(lambda x: self.function(x, *args, **kwargs))
class Unwrap:
def __ror__(self, other):
if isinstance(other, Wrap):
return other.unwrap()
else:
return other
For example,
In [13]: a = np.array([1, 2, 3])
In [14]: Wrap(a) | Pipe(np.median) | Unwrap()
Out[14]: 2.0
In [15]: b = [1, 2, 3]
In [16]: Wrap(b) | Pipe(np.median) | Unwrap()
Out[16]: 2.0
One more follow-up...
If it is acceptable to require a wrapper at the input of the pipe and an unwrapper at the output, the pipe mechanism can be changed to allow arbitrary callables in the pipe without being wrapped in a Pipe
class:
# A token to indicate the end of a pipe.
ExitPipe = None
class EnterPipe:
def __init__(self, obj):
self._object = obj
def __or__(self, other):
if other == ExitPipe:
return self._object
if not callable(other):
raise TypeError(f"pipe element '{other}' is not callable")
return EnterPipe(other(self._object))
For example,
In [10]: a = np.array([1, 2, 3])
In [11]: EnterPipe(a) | np.sin | np.cos | max | ExitPipe
Out[11]: 0.9900590857598653
In [12]: max(np.cos(np.sin(a)))
Out[12]: 0.9900590857598653