Is there any option to improve the time efficiency of the this data normalization any further?-CodePudding

I have a matrix named tArray with shape (11, 512) and want to normalize the values in it. I see that the np.max() costs a lot time but I didn't find any option to improve it any further. Can the time efficiency of this following line of code be improved?:

tArray = np.array([[val/tArray[i][sqLen-1] for val in tArray[i]] if i not in [1,2] else [val/np.max(tArray[i][:sqLen-1]) for val in tArray[i]] for i in range(len(tArray))])

to reproduce:

tArray = np.random.randint(1, 100, size=(11, 512))
tArray = np.array([[val/tArray[i][512-1] for val in tArray[i]] if i not in [1,2] else [val/np.max(tArray[i][:512-1]) for val in tArray[i]] for i in range(len(tArray))])```

CodePudding user response：

How about this for a 350x speedup (560x on floats)?

def f(a):
    d = a[:, -1].copy()
    d[1:3] = a[1:3, :-1].max(1)
    return a / d[:, None]

On float arrays, it's twice faster than @Roman's answer. I would argue that it is also a bit easier to read.

a = np.random.uniform(1, 100, size=(11, 512))

%timeit np.vstack((a[0]/a[0,-1], a[1:3,:]/a[1:3,:-1].max(), a[3:,:]/a[3:,-1][:,None]))
24.4 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit f(a)
11.8 µs ± 22.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

On int arrays, the difference is a bit less drastic (60% faster).

CodePudding user response：

Here is ~180X speedup improvement approach:

Note, for the shape of your input array [512-1] is the same as [-1] (last column) and [:512-1] is the same as [:-1].

The main condition of your loop if i not in [1,2] else tells that the aggregations/calculations are implied for exactly 3 slices: [0] (first row), [1:3] (rows 1 and 2) and the remaining rows [3:].

So instead of iterating over each row and recalculating each column we can apply the needed operations for 3 sequential slices at once in vectorized manner and eventually concatenate the results with np.vstack routine:

np.vstack((tArray[0]/tArray[0,-1], tArray[1:3]/tArray[1:3,:-1].max(), tArray[3:]/tArray[3:,-1][:,None]))

Let's see on measurements:

tArray = np.random.randint(1, 100, size=(11, 512)) # input array

In [165]: %timeit tArray1 = np.array([[val/tArray[i][512-1] for val in tArray[i]] if i not in [1,2] else [val/np.max
     ...: (tArray[i][:512-1]) for val in tArray[i]] for i in range(len(tArray))])
4.54 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [171]: %timeit new_arr = np.vstack((tArray[0]/tArray[0,-1], tArray[1:3]/tArray[1:3,:-1].max(), tArray[3:]/tAr
     ...: ray[3:,-1][:,None]))
25.5 µs ± 264 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Of course, tArray1 and new_arr have the same content:

In [173]: tArray1
Out[173]: 
array([[ 8.11111111,  2.33333333,  9.33333333, ...,  0.44444444,
         5.22222222,  1.        ],
       [ 0.76767677,  0.77777778,  0.72727273, ...,  0.58585859,
         0.29292929,  0.09090909],
       [ 0.36363636,  0.85858586,  0.35353535, ...,  0.06060606,
         0.48484848,  0.55555556],
       ...,
       [ 1.875     ,  2.04166667,  0.29166667, ...,  0.20833333,
         0.58333333,  1.        ],
       [ 0.28735632,  0.11494253,  0.37931034, ...,  0.50574713,
         0.74712644,  1.        ],
       [ 5.625     , 10.5       ,  0.5       , ...,  2.125     ,
         0.75      ,  1.        ]])

In [174]: new_arr
Out[174]: 
array([[ 8.11111111,  2.33333333,  9.33333333, ...,  0.44444444,
         5.22222222,  1.        ],
       [ 0.76767677,  0.77777778,  0.72727273, ...,  0.58585859,
         0.29292929,  0.09090909],
       [ 0.36363636,  0.85858586,  0.35353535, ...,  0.06060606,
         0.48484848,  0.55555556],
       ...,
       [ 1.875     ,  2.04166667,  0.29166667, ...,  0.20833333,
         0.58333333,  1.        ],
       [ 0.28735632,  0.11494253,  0.37931034, ...,  0.50574713,
         0.74712644,  1.        ],
       [ 5.625     , 10.5       ,  0.5       , ...,  2.125     ,
         0.75      ,  1.        ]])

CodePudding user response：

Create an array of denominators, replacing the ones in the selected rows with the max. Then divide the whole matrix by this array of denominators (you need to transpose the matrix to do this then transpose it back again).

t = np.random.randint(1, 100, size=(11, 512))
ignore = [1, 2] 
denoms = t[..., -1].copy()
denoms[ignore] = t[ignore, :-1].max(axis=1)
result = (t.T / denoms).T

This seems to be slightly faster than the vstack solution and also allows you to choose which rows to select a bit more cleanly.