I am trying to figure out how to pad an array using the pattern shown below:
0000
0 000
00 00
000 0
0000
For example:
[[1,2,3],
[4,5,6],
[7,8,9]]
Would become:
[[1,2,3,0,0],
[0,4,5,6,0],
[0,0,7,8,9]]
I figured out how to do it using manual looping but I feel like there is probably a much faster way that uses numpy's array manipulation functions, I just can't figure out how to do it.
CodePudding user response:
AFAIK, there is no function that does directly that in Numpy. You can use a loop iterating over the rows, but this solution is inefficient unless the 2D array is huge.
Numba solution
One solution to do this very efficiently is to use Numba and trivial loops:
import numba as nb
@nb.njit
def row_shifts_numba(arr):
n, m = arr.shape
out = np.zeros((n, n m-1), dtype=arr.dtype)
for i in range(n):
for j in range(m):
out[i, i j] = arr[i, j]
return out
data = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
row_shifts_numba(data)
Note that the first execution is slower due to the jut-in-time compilation. If you do not want to pay this compilation time during the first execution, then you can specify the type of the array in the signature (eg. @nb.njit('int32[:,::1](int32[:,::1])')
where int32
is the input/output array type and ::1
means the axis is contiguous).
Alternative pure-Numpy solution
Another solution consists in using a Numpy reshape trick so to generate the output 2D array. The idea is to create a bigger 2D array and then reshape it so to produce the shifts:
def row_shifts_numpy(arr):
n, m = arr.shape
out = np.zeros((n, n m), dtype=arr.dtype)
out[:n,:m] = arr
return out.reshape(-1)[:-n].reshape(n, n m-1)
This should be a bit slower than Numba due to the (implicit) creation of temporary arrays, but it is fully vectorized and only use Numpy.
CodePudding user response:
After creating a zero array with the desired shape, we can create an index array in which the values are shifted row by row. Then filling the zero array by the original array values:
n, m = arr.shape
result = np.zeros((n, n m-1), dtype=np.int64)
first_col_ind = np.array(np.arange(m))
ind = first_col_ind[:, None] np.arange(n)
# [[0 1 2]
# [1 2 3]
# [2 3 4]]
result[np.arange(n), ind] = arr.T
# result:
# [[1 2 3 0 0]
# [0 4 5 6 0]
# [0 0 7 8 9]]