How to map function over numpy with condition on each variable?-CodePudding

I get this error when trying to do map this function over the numpy array:

>>> a = np.array([1, 2, 3, 4, 5])
>>> g = lambda x: 0 if x % 2 == 0 else 1
>>> g(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I was expecting result array([ 1, 0, 1, 0, 1])

When it works fine in this case:

>>> f = lambda x: x ** 2
>>> f(a)
array([ 1,  4,  9, 16, 25])

What can I do to map function g over the array a faster than a for loop, preferably using some of numpy's faster code?

CodePudding user response：

This has problems:

a = np.array([1, 2, 3, 4, 5])
g = lambda x: 0 if x % 2 == 0 else 1
g(a)

A lambda is essentially just an unnamed function, which you happen to be naming here, so you might as well:

def g(x):
    return 0 if x % 2 == 0 else 1

But that's still a bit odd, since taking an integer modulo 2 already is 0 or 1, so this would be the same (when applied to integers, which is what you're looking to do):

def g(x):
    return x % 2

At which point you have to wonder if a function is needed at all. And it isn't, this works:

a = np.array([1, 2, 3, 4, 5])
a % 2

However, note that the mistake you made is that f = lambda x: x ** 2 followed by f(a) works not because it applies the operation to each element - it applies the operation to the array, and the array supports spreading of the operation to its elements for raising to a power, just like it does for the modulo operator, which is why a % 2 works.

Result:

array([1, 0, 1, 0, 1], dtype=int32)

Note that this type of spreading isn't something that generally works - you shouldn't expect Python to just do the spreading when needed for any data type (like a list or set). It's a feature of numpy's implementation of arrays, the operations have been defined on the array and implemented to spread the operation over the elements.

CodePudding user response：

You can execute mathematical some operations such as exponents on entire numpy arrays so you're doing the equivalent of np.array([ 1, 2, 3, 4, 5])**2. ~~But you cannot use the modulus operator on a numpy array hence giving you an error.~~ The lambda function is being applied to the entire array here not each individual element.

You can use np.vectorize here instead:

modulus = np.vectorize(lambda x: 0 if x % 2 == 0 else 1)
modulus(a)

CodePudding user response：

Lets decompose this. The general form of your ternary expression is x if C else y, with C being the comparison. Lets pull out just that comparison to see what we get:

>>> a = np.array([1, 2, 3, 4, 5])
>>> a % 2 == 0
array([False,  True, False,  True, False])

That mod and comparison gives a new array where each value as been moded and compared. Now you throw in an if (in this case if array([False, True, False, True, False])). But should this if be true if all of the array elements are True or maybe just a single one? What if the array is empty, is that different? That's why pandas has the methods listed in the error message - you have to decide what True is.

But not in this case. You really just wanted the base 2 modulus of each element and you don't have to work so hard to get it.

>>> a = np.array([1, 2, 3, 4, 5])
>>> a % 2
array([1, 0, 1, 0, 1])

There's your answer!