Efficient numpy row-wise matrix multiplication using 3d arrays-CodePudding

I have two 3d arrays of shape (N, M, D) and I want to perform an efficient row wise (over N) matrix multiplication such that the resulting array is of shape (N, D, D).

An inefficient code sample showing what I try to achieve is given by:

N = 100
M = 10
D = 50
arr1 = np.random.normal(size=(N, M, D))
arr2 = np.random.normal(size=(N, M, D))
result = []
for i in range(N):
    result.append(arr1[i].T @ arr2[i])
result = np.array(result)

However, this application is quite slow for large N due to the loop. Is there a more efficient way to achieve this computation without using loops? I already tried to find a solution via tensordot and einsum to no avail.

CodePudding user response：

The vectorization solution is to swap the last two axes of arr1:

>>> N, M, D = 2, 3, 4
>>> np.random.seed(0)
>>> arr1 = np.random.normal(size=(N, M, D))
>>> arr2 = np.random.normal(size=(N, M, D))
>>> arr1.transpose(0, 2, 1) @ arr2
array([[[ 6.95815626,  0.38299107,  0.40600482,  0.35990016],
        [-0.95421604, -2.83125879, -0.2759683 , -0.38027618],
        [ 3.54989101, -0.31274318,  0.14188485,  0.19860495],
        [ 3.56319723, -6.36209602, -0.42687188, -0.24932248]],

       [[ 0.67081341, -0.08816343,  0.35430089,  0.69962394],
        [ 0.0316968 ,  0.15129449, -0.51592291,  0.07118177],
        [-0.22274906, -0.28955683, -1.78905988,  1.1486345 ],
        [ 1.68432706,  1.93915798,  2.25785798, -2.34404577]]])

A simple benchmark for the super N:

In [225]: arr1.shape
Out[225]: (100000, 10, 50)

In [226]: %%timeit
     ...: result = []
     ...: for i in range(N):
     ...:     result.append(arr1[i].T @ arr2[i])
     ...: result = np.array(result)
     ...:
     ...:
12.4 s ± 224 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [227]: %timeit arr1.transpose(0, 2, 1) @ arr2
906 ms ± 26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)