Home > Blockchain >  Fastest way to apply function along axes
Fastest way to apply function along axes

Time:01-21

In a time-critical code fragment, I need to apply a function along different axes of a tensor and sum results. A peculiar feature is that the number of axes of the tensor (ns_test) can be large. I came up with two implementations, where I move the current axis (moveaxis) to either zeroth (h_zero) or last (h_last) position, apply the function, and move the axis back. I am not sure it is the best way.

import numpy as np
import time

def h_last(state, km, ns):
    new_state = np.zeros_like(state)
    for i in range(ns):
        a = np.moveaxis(state, i 1, -1).copy()
        for k in range(km):
            a[..., k] = (k 0.5) * a[..., k]
        new_state  = np.moveaxis(a, -1, i 1)
    return new_state

def h_zero(state, km, ns):
    new_state = np.zeros_like(state)
    for i in range(ns):
        a = np.moveaxis(state, i 1, 0).copy()
        for k in range(km):
            a[k, ...] = (k 0.5) * a[k, ...]
        new_state  = np.moveaxis(a, 0, i 1)
    return new_state

# ==================== init ============================
km_test  = 4        
ns_test = 7
nreps = 100
dims = tuple([ns_test]   [km_test] * ns_test)    
y= np.random.rand(*dims)
    
# =================== first run =============================
tic = time.perf_counter()
for i in range(nreps):
    yy = h_last(y, km_test, ns_test)
toc = time.perf_counter()
print(f"Run time h_last {toc - tic:0.4f} seconds")

# =================== second run =============================
tic = time.perf_counter()
for i in range(nreps):
    yyy = h_zero(y, km_test, ns_test)
toc = time.perf_counter()
print(f"Run time h_zero {toc - tic:0.4f} seconds")

print(np.linalg.norm(yy-yy)) 

I am surprised a bit that the zeroth axis performs better (I thought python internally uses the C-order for storage). But my main question is how to further speed up the code? I looked into apply_along_axis, but this seems to be very slow.

CodePudding user response:

As discussed in comments moveaxis is fast. It is just a view. So it is an interesting method to work when you have so many axes that you need a loop to iterate axis number and perform an operation on a given axis.

What slows down your computation is the many copies you make of your array along the way. Plus, as said in comments, it is also this copy that makes the version with last axis being slightly slower, because of "memory cache, and C order" considerations. I won't develop here what I said in comments, because it is quite accessory (we are talking 20% performance lost. Not negligible, but really nothing compare to your real problem)

So one way to do the operation you described faster is to avoid these copies

For example, here is a code that, roughly use the same tricks as you do (using moveaxis)

tic = time.perf_counter()
kh=np.arange(0.5, km_test).reshape([1]*ns_test [-1])

Y=np.zeros_like(y)
for i in range(ns_test):
    Y  = y*np.moveaxis(kh, -1, i 1)
toc = time.perf_counter()
print(f"Run time arange {toc - tic:0.4f} seconds")

print(np.linalg.norm(yy-Y)) # Note that in your code you compare yy with yy. You probably meant yyy for one of the two yy.

Result on my PC

Run time h_last 1.2403 seconds
Run time h_zero 0.9995 seconds
Run time arange 0.0061 seconds
0.0

So result is the same. But computation is approx. 2 times faster.

Yet, I use moveaxis as often as you do. On a smaller array, sure, since I've chose to use it only on the arange array. Move moveaxis cost is proportional to the number of axes, not the size of the data (it is just some play with strides and things like that. No data is moved. That's the whole point). Also, because of this arange I avoid one for loop. But it is probably not what explain most of the performance ratio. since that is an outer for loop (the important ones performance-wise, are the inner one, inside numpy * operation). So, what remains to explain the performance ratio, is the copy, I would say.

Note, for example, that if I do this

Y  = np.moveaxis(np.moveaxis(y, i 1, -1)*kh, -1, i 1)

(that is moving axis of y instead of moving axis of kh. Which forces me to move the axis back before using the result, as you did) instead of my previous Y =... line, it is also the same result, and the same kind of performance, while being even closer to your method. moveaxis costs nothing, roughly. So my choice of moving axis of kh is clearer for my taste, but is not what makes performance difference.

  • Related