Vectorize a simple python loop with different numpy.roll for each row-CodePudding

I've searching a lot of similar questions, but no one seems to answer my particular problem, so finally I decided to make my own question.

I have an array with several independent time series, and I need to perform a roll which is different for each time series. The array has dimensions a[N,L] where N is the number of time series and L the length for each time series. I want to store a rolled version of each time series, but with a different roll for each one, and store them in an array b with the same dimensions; the different rolls to be performed to each time series are stored in the array shf which is an integer array with dimension N.

The loop is the following:

for i in np.arange(N):
    b[i]=np.roll(a[i],shf[i])

but given the dimensions of the arrays and the huge number of time series in my program, this loop takes a lot of time. Given that every time series is independent of the others, I would like to parallelize this loop to speed my program up. Sure it is very straightforward, but I feel clumsy. Any idea will be well received.

CodePudding user response：

Your way is likely the right way. You may be able to cut out some of the overhead by using numba or similar, but it probably won't be much. If you are willing to waste a whole bunch of memory, you can pre-compute the index array that gives you the desired roll, which may speed things up if you do this computation multiple times with a static shf:

index = (np.arange(L) - shf[:, None]) % L
b = a[index]

CodePudding user response：

In [21]: a = np.arange(12).reshape(3,4)
In [22]: a
Out[22]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [23]: np.roll(a,2,1)
Out[23]: 
array([[ 2,  3,  0,  1],
       [ 6,  7,  4,  5],
       [10, 11,  8,  9]])

The code for roll is more complex because you can specify several axes. But essentially it is doing 2 sliced copies (per dimension). It's a whole new array with every element moved. It's not a simple as the word might sound.

In [24]: res = np.empty_like(a)
In [25]: res[:,:2] = a[:,-2:]
In [26]: res[:,-2:] = a[:,:2]

In [27]: res
Out[27]: 
array([[ 2,  3,  0,  1],
       [ 6,  7,  4,  5],
       [10, 11,  8,  9]])

Doing a different roll for each row requires a different set of slices for each row. The only alternative to row by row iteration is to create advanced indexing arrays for the whole array. That will require at least some sort of iteration, and may be slower indexing.

CodePudding user response：

IIUC, this code can be written in a np.roll-equivalent parallel no-python numba code as:

@nb.njit(parallel=True)
def numba_(a, shf):
    b = np.empty_like(a)
    rows_num = a.shape[0]
    cols_num = a.shape[1]
    for i in nb.prange(rows_num):
        n = shf[i]
        b[i, n:] = a[i, :cols_num - n]
        b[i, :n] = a[i, cols_num - n:]
    return b

which will get similar results as the OP code but in a very faster scheme, that took 730 ms per loop using google Collaboratory CPU for the expected data volumes.