Different behaviour between numpy array and plain list in Pipe library-CodePudding

I am testing the Pipe library for Python, which tries to simplify the processing of iterables, essentially allowing one to write

ys = xs | f1 | f2 | f3

instead of

ys = f3(f2(f1(xs))

The Pipe class itself is very short: just call the wrapped function when encountering a __ror__ operation:

import functools 

class Pipe:
    def __init__(self, function):
        self.function = function
        functools.update_wrapper(self, function)

    def __ror__(self, other):
        return self.function(other)

    def __call__(self, *args, **kwargs):
        return Pipe(lambda x: self.function(x, *args, **kwargs))

However, I found a particular case where the output is not as expected:

import numpy as np
from pipe import Pipe

y1 = np.array([1, 2, 3]) | Pipe(np.median)
y2 =          [1, 2, 3]  | Pipe(np.median)

print(y1)  # [1.0 2.0 3.0]
print(y2)  # 2.0

In both cases, I'd assume that np.median is applied on the iterable, but when the input is a numpy array, the function is mapped over the array, instead of making the array the actual argument (as is the case when using a plain list). That is, in the __ror__ method, the other argument is an item from the array, as opposed to the actual array itself. I don't see how this can happen. Is it some numpy quirk?

Here's the full reproducible example.

I didn't find any related issue in the project's repo.

CodePudding user response：

"I don't see how this can happen. Is it some numpy quirk?"

NumPy has broadcasting. With the array np.array([1, 2, 3]) on the left, the __or__ method of the NumPy array treats the object Pipe(np.median) that it gets as its other argument as a scalar, and broadcasts it so that it acts like [Pipe(np.median), Pipe(np.median), Pipe(np.median)] (i.e. so the shape of other matches the shape of self).

Because Pipe(np.median) is not a type known to NumPy, the other argument is treated as an object array, and the __or__ operation will attempt to operate elementwise on the inputs. That means that eventually, the __ror__ method of Pipe(np.median) is called, but it happens for each pair in np.array([1, 2, 3]) and [Pipe(np.median), Pipe(np.median), Pipe(np.median)]. That's why you get the result applied to each element, rather than np.median applied to the array on the left as a whole. That is also why the data type of the result is object rather than a NumPy floating point type:

In [19]: np.array([1, 2, 3]) | Pipe(np.median)

Out[19]: array([1.0, 2.0, 3.0], dtype=object)

Regarding the handling of classes that implement __or__, the author of Pipe says "That's the very limit of Pipe, there's no clean way to overcome this." See also Pipe does not work if left side object defines __or__.

For what its worth...

A modification of the code given in the question that would allow Pipe to work with a NumPy array (or any other object that implements __or__) is to provide an option to put a thin wrapper around data before it is sent into the pipe. If the __ror__ method of Pipe sees this wrapper, it unwraps the data, calls its function, and rewraps the result before returning it. A final step at the end of the pipe is needed to unwrap data that was wrapped. Perhaps something like this:

import functools 


class Wrap:
    def __init__(self, obj):
        self._object = obj

    def unwrap(self):
        return self._object


class Pipe:
    def __init__(self, function):
        self.function = function
        functools.update_wrapper(self, function)

    def __ror__(self, other):
        if isinstance(other, Wrap):
            return Wrap(self.function(other.unwrap()))
        else:
            return self.function(other)

    def __call__(self, *args, **kwargs):
        return Pipe(lambda x: self.function(x, *args, **kwargs))


class Unwrap:
    def __ror__(self, other):
        if isinstance(other, Wrap):
            return other.unwrap()
        else:
            return other

For example,

In [13]: a = np.array([1, 2, 3])

In [14]: Wrap(a) | Pipe(np.median) | Unwrap()
Out[14]: 2.0

In [15]: b = [1, 2, 3]

In [16]: Wrap(b) | Pipe(np.median) | Unwrap()
Out[16]: 2.0

One more follow-up...

If it is acceptable to require a wrapper at the input of the pipe and an unwrapper at the output, the pipe mechanism can be changed to allow arbitrary callables in the pipe without being wrapped in a Pipe class:

# A token to indicate the end of a pipe.
ExitPipe = None


class EnterPipe:
    def __init__(self, obj):
        self._object = obj

    def __or__(self, other):
        if other == ExitPipe:
            return self._object
        if not callable(other):
            raise TypeError(f"pipe element '{other}' is not callable")
        return EnterPipe(other(self._object))

For example,

In [10]: a = np.array([1, 2, 3])

In [11]: EnterPipe(a) | np.sin | np.cos | max | ExitPipe
Out[11]: 0.9900590857598653

In [12]: max(np.cos(np.sin(a)))
Out[12]: 0.9900590857598653