Efficient way to insert similar rows (with only one column changing) right after each row in Numpy o-CodePudding

Say I have a Pandas dataframe with 4 rows and 5 columns. For simplicity I will convert it into a Numpy array, which looks like this:

import numpy as np

A = np.array([[23, 43, 23, 110, 5],
              [83, 32, 12, 123, 4],
              [58, 41, 59, 189, 1],
              [93, 77, 22, 170, 3]])

For each row, I want to insert some similar rows right after the row, with only the column 4 decreasing by 1 each time until 0. The expected output should look like this:

np.array([[23, 43, 23, 110, 5],
          [23, 43, 23, 110, 4],
          [23, 43, 23, 110, 3],
          [23, 43, 23, 110, 2],
          [23, 43, 23, 110, 1],
          [23, 43, 23, 110, 0],

          [83, 32, 12, 123, 4],
          [83, 32, 12, 123, 3],
          [83, 32, 12, 123, 2],
          [83, 32, 12, 123, 1],
          [83, 32, 12, 123, 0],

          [58, 41, 59, 189, 1],
          [58, 41, 59, 189, 0],

          [93, 77, 22, 170, 3],
          [93, 77, 22, 170, 2],
          [93, 77, 22, 170, 1],
          [93, 77, 22, 170, 0]])

Below is the code that I have come up with:

new_rows = []
for i, row in enumerate(A):
    new = A[i, 4] - 1
    while new >= 0:
        new_row = row.copy()
        new_row[4] = new
        new_rows.append(new_row)
        new -= 1
new_A = np.vstack([A, np.array(new_rows)])
print(new_A)

Output

[[ 23  43  23 110   5]
 [ 83  32  12 123   4]
 [ 58  41  59 189   1]
 [ 93  77  22 170   3]
 [ 23  43  23 110   4]
 [ 23  43  23 110   3]
 [ 23  43  23 110   2]
 [ 23  43  23 110   1]
 [ 23  43  23 110   0]
 [ 83  32  12 123   3]
 [ 83  32  12 123   2]
 [ 83  32  12 123   1]
 [ 83  32  12 123   0]
 [ 58  41  59 189   0]
 [ 93  77  22 170   2]
 [ 93  77  22 170   1]
 [ 93  77  22 170   0]]

Obviously, the code is not efficient since it doesn't use any Numpy vectorization. In reality, I have more than 4,000 original rows so a speed-up is definitely needed. Moreover, I cannot insert new rows right after each row. Is there any efficient way to do this in Numpy or Pandas?

CodePudding user response：

If you are interested in a solution not using any pure-Python loops but Numba instead, then here is one:

import numba as nb

@nb.njit('int32[:,::1](int32[:,::1])')
def compute(A):
    n, m = A.shape
    rows = A[:,-1].sum()   n
    res = np.empty((rows, m), dtype=np.int32)
    row = 0
    for i in range(n):
        count = A[i, -1]
        for j in range(count 1):
            res[row j, 0:m-1] = A[i, 0:m-1]
            res[row j, m-1] = count-j
        row  = count 1
    return res

result = compute(A)

This solution is 12 times faster than the solution of @sammywemmy on my machine although A is very small. It should be even faster on bigger inputs.

CodePudding user response：

arr = A[:, -1]   1

temp = np.repeat(A, arr, axis = 0)

# depending on your array size
# you can build the range here with a much faster implementation
from this link : # https://stackoverflow.com/a/47126435/7175713
arr = np.concatenate([np.arange(ent) for ent in arr])

temp[:, -1] = temp[:, -1] - arr

temp

array([[ 23,  43,  23, 110,   5],
       [ 23,  43,  23, 110,   4],
       [ 23,  43,  23, 110,   3],
       [ 23,  43,  23, 110,   2],
       [ 23,  43,  23, 110,   1],
       [ 23,  43,  23, 110,   0],
       [ 83,  32,  12, 123,   4],
       [ 83,  32,  12, 123,   3],
       [ 83,  32,  12, 123,   2],
       [ 83,  32,  12, 123,   1],
       [ 83,  32,  12, 123,   0],
       [ 58,  41,  59, 189,   1],
       [ 58,  41,  59, 189,   0],
       [ 93,  77,  22, 170,   3],
       [ 93,  77,  22, 170,   2],
       [ 93,  77,  22, 170,   1],
       [ 93,  77,  22, 170,   0]])

CodePudding user response：

I've finally figured out a solution:

rep = np.repeat(A, A[:, -1]   1, axis=0)
rep[:, -1] = np.concatenate([np.arange(0, n 1)[::-1] for n in A[:, -1]])

Output:

>>> rep
array([[ 23,  43,  23, 110,   5],
       [ 23,  43,  23, 110,   4],
       [ 23,  43,  23, 110,   3],
       [ 23,  43,  23, 110,   2],
       [ 23,  43,  23, 110,   1],
       [ 23,  43,  23, 110,   0],
       [ 83,  32,  12, 123,   4],
       [ 83,  32,  12, 123,   3],
       [ 83,  32,  12, 123,   2],
       [ 83,  32,  12, 123,   1],
       [ 83,  32,  12, 123,   0],
       [ 58,  41,  59, 189,   1],
       [ 58,  41,  59, 189,   0],
       [ 93,  77,  22, 170,   3],
       [ 93,  77,  22, 170,   2],
       [ 93,  77,  22, 170,   1],
       [ 93,  77,  22, 170,   0]])