Numpy: apply function that creates an array-CodePudding

Numpy apply_along_axis/apply_over_axes assume that the applied function returns a scalar, but what if I want to use a function that returns an array (thus adding new dimensions)?

Below is a simplified example. I want to apply my_func to each row of an array. I could do this in pandas but expect numpy to be faster. Function:

def my_func(k):
    x = np.arange(3)
    y = x ** k
    return y

Original array:

array([[1],
       [2],
       [3]])

Expected result:

array([[ 0,  1,  2,  3],
       [ 0,  1,  4,  9],
       [ 0,  1,  8, 27]], dtype=int32)

Update: it was an oversimplified example. I should have said the real function can only take a scalar as input. But the solution proposed by Michael Szczesny in comments works for such functions too.

Update2: I should have said a function that does not broadcast, like this:

def my_func(k):
    return np.random.randint(1, 4, 5)   k

CodePudding user response：

I am sharing the code for your reference,

import numpy as np
def my_func(k):
    x = np.arange(4)
    y = x ** k
    return y
inp = np.array([[1],[2],[3]])
print(my_func(inp))

Output:

[[ 0  1  2  3]
 [ 0  1  4  9]
 [ 0  1  8 27]]

See if it helps?

CodePudding user response：

Your function, with an added print to see exactly what k is:

In [39]: def my_func(k):
    ...:     print(k)
    ...:     x = np.arange(4)     # range to match your expected result
    ...:     y = x ** k
    ...:     return y
    ...:

As written the function works with your (3,1) array, arr = np.arange(1,4)[:,None]:

In [40]: my_func(arr)
[[1]
 [2]
 [3]]
Out[40]: 
array([[ 0,  1,  2,  3],
       [ 0,  1,  4,  9],
       [ 0,  1,  8, 27]])

Note the whole 2d array is passed. The x**k step works by broadcasting, using a (4,) array with a (3,1), to produce a (3,4) result. You should, if possible write functions that work like this, taking full advantage of the numpy methods and operators.

apply... can be used as here:

In [41]: np.apply_along_axis(my_func, 1, arr)
[1]
[2]
[3]
Out[41]: 
array([[ 0,  1,  2,  3],
       [ 0,  1,  4,  9],
       [ 0,  1,  8, 27]])

Note that it passes (1,) arrays to the function. The docs should make it clear that this is designed to pass a 1d array to the function, NOT a scalar.

The equivalent for a 2d arr array is:

In [42]: np.array([my_func(i) for i in arr])
[1]
[2]
[3]
Out[42]: 
array([[ 0,  1,  2,  3],
       [ 0,  1,  4,  9],
       [ 0,  1,  8, 27]])

Now lets comment out the print and do some time tests:

In [44]: timeit my_func(arr)
7.41 µs ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [45]: timeit np.apply_along_axis(my_func, 1, arr)
89.2 µs ± 649 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [46]: timeit np.array([my_func(i) for i in arr])
28.9 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The broadcasted approach is fastest. apply_along_axis is slowest.

I claim that apply_along_axis is only useful when the array dimensions are greater than 2, and even then it just makes the code prettier, not faster.

For example with a 3d array, that still broadcasts with the (4,) shape x:

In [47]: arr = np.arange(24).reshape(2,3,4)
In [49]: np.apply_along_axis(my_func, 2, arr).shape
Out[49]: (2, 3, 4)
In [50]: my_func(arr).shape
Out[50]: (2, 3, 4)
In [51]: np.array([[my_func(arr[i,j,:]) for j in range(3)] for i in range(2)]).shape
Out[51]: (2, 3, 4)

The list iteration requires a double loop. apply_along_axis hides this, but does not reduce the total number of calls to my_func.

If your function really required a scalar (e.g. use a math.cos or if test), then you might consider np.vectorize. For smallist examples it's slower than the equivalent list comprehension, but it does scale better for large ones. But again, if you can write the function to work directly with array, you'll much happier with the performance.