Home > Back-end >  How to map function over numpy with condition on each variable?
How to map function over numpy with condition on each variable?

Time:07-04

I get this error when trying to do map this function over the numpy array:

>>> a = np.array([1, 2, 3, 4, 5])
>>> g = lambda x: 0 if x % 2 == 0 else 1
>>> g(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()     

I was expecting result array([ 1, 0, 1, 0, 1])

When it works fine in this case:

>>> f = lambda x: x ** 2
>>> f(a)
array([ 1,  4,  9, 16, 25])

What can I do to map function g over the array a faster than a for loop, preferably using some of numpy's faster code?

CodePudding user response:

This has problems:

a = np.array([1, 2, 3, 4, 5])
g = lambda x: 0 if x % 2 == 0 else 1
g(a)

A lambda is essentially just an unnamed function, which you happen to be naming here, so you might as well:

def g(x):
    return 0 if x % 2 == 0 else 1

But that's still a bit odd, since taking an integer modulo 2 already is 0 or 1, so this would be the same (when applied to integers, which is what you're looking to do):

def g(x):
    return x % 2

At which point you have to wonder if a function is needed at all. And it isn't, this works:

a = np.array([1, 2, 3, 4, 5])
a % 2

However, note that the mistake you made is that f = lambda x: x ** 2 followed by f(a) works not because it applies the operation to each element - it applies the operation to the array, and the array supports spreading of the operation to its elements for raising to a power, just like it does for the modulo operator, which is why a % 2 works.

Result:

array([1, 0, 1, 0, 1], dtype=int32)

Note that this type of spreading isn't something that generally works - you shouldn't expect Python to just do the spreading when needed for any data type (like a list or set). It's a feature of numpy's implementation of arrays, the operations have been defined on the array and implemented to spread the operation over the elements.

CodePudding user response:

You can execute mathematical some operations such as exponents on entire numpy arrays so you're doing the equivalent of np.array([ 1, 2, 3, 4, 5])**2. But you cannot use the modulus operator on a numpy array hence giving you an error. The lambda function is being applied to the entire array here not each individual element.

You can use np.vectorize here instead:

modulus = np.vectorize(lambda x: 0 if x % 2 == 0 else 1)
modulus(a)

CodePudding user response:

Lets decompose this. The general form of your ternary expression is x if C else y, with C being the comparison. Lets pull out just that comparison to see what we get:

>>> a = np.array([1, 2, 3, 4, 5])
>>> a % 2 == 0
array([False,  True, False,  True, False])

That mod and comparison gives a new array where each value as been moded and compared. Now you throw in an if (in this case if array([False, True, False, True, False])). But should this if be true if all of the array elements are True or maybe just a single one? What if the array is empty, is that different? That's why pandas has the methods listed in the error message - you have to decide what True is.

But not in this case. You really just wanted the base 2 modulus of each element and you don't have to work so hard to get it.

>>> a = np.array([1, 2, 3, 4, 5])
>>> a % 2
array([1, 0, 1, 0, 1])

There's your answer!

  • Related