Pad numpy array so each row is offset by one from the previous row-CodePudding

I am trying to figure out how to pad an array using the pattern shown below:

For example:

[[1,2,3],
 [4,5,6],
 [7,8,9]]

Would become:

[[1,2,3,0,0],
 [0,4,5,6,0],
 [0,0,7,8,9]]

I figured out how to do it using manual looping but I feel like there is probably a much faster way that uses numpy's array manipulation functions, I just can't figure out how to do it.

CodePudding user response：

AFAIK, there is no function that does directly that in Numpy. You can use a loop iterating over the rows, but this solution is inefficient unless the 2D array is huge.

Numba solution

One solution to do this very efficiently is to use Numba and trivial loops:

import numba as nb

@nb.njit
def row_shifts_numba(arr):
    n, m = arr.shape
    out = np.zeros((n, n m-1), dtype=arr.dtype)
    for i in range(n):
        for j in range(m):
            out[i, i j] = arr[i, j]
    return out

data = np.array([[1,2,3],
                 [4,5,6],
                 [7,8,9]])
row_shifts_numba(data)

Note that the first execution is slower due to the jut-in-time compilation. If you do not want to pay this compilation time during the first execution, then you can specify the type of the array in the signature (eg. @nb.njit('int32[:,::1](int32[:,::1])') where int32 is the input/output array type and ::1 means the axis is contiguous).

Alternative pure-Numpy solution

Another solution consists in using a Numpy reshape trick so to generate the output 2D array. The idea is to create a bigger 2D array and then reshape it so to produce the shifts:

def row_shifts_numpy(arr):
    n, m = arr.shape
    out = np.zeros((n, n m), dtype=arr.dtype)
    out[:n,:m] = arr
    return out.reshape(-1)[:-n].reshape(n, n m-1)

This should be a bit slower than Numba due to the (implicit) creation of temporary arrays, but it is fully vectorized and only use Numpy.

CodePudding user response：

After creating a zero array with the desired shape, we can create an index array in which the values are shifted row by row. Then filling the zero array by the original array values:

n, m = arr.shape
result = np.zeros((n, n m-1), dtype=np.int64)

first_col_ind = np.array(np.arange(m))
ind = first_col_ind[:, None]   np.arange(n)
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]

result[np.arange(n), ind] = arr.T

# result:
# [[1 2 3 0 0]
#  [0 4 5 6 0]
#  [0 0 7 8 9]]