Suppose I have an array A of size (M,N,D,D) and a vector x of size (M,N,D). I want to produce the array b of size (M,N,D) where b[m,n,:] = np.matmul(A[m,n,:,:],x[m,n,:]). So basically, the D-by-D matrix contained in A[m,n,:] and the vector of size D contained in x[m,n,:] should be multiplied together following standard matrix-vector multiplication. A naive for-loop implementation would be
for m in range(M):
for n in range(N):
b[m,n,:] = np.matmul(A[m,n,:],x[m,n,:])
Obviously this is inefficient for large N and M (which is the case for me; N is of the order of 100,000 and M is of the order of 1000). My question is: How to vectorize this loop? I've tried using np.tensordot but I keep getting shape mismatches no matter how I try to structure it, presumably because A has 4 axes whereas x has 3 (Note that for all I know, np.tensordot is the right function to use, I'm just not using it correctly)
EDIT Tried using np.einsum, but I don't get the expected result (which should be [5,11]:
CodePudding user response:
This is where np.einsum
shines:
np.einsum('mnij,mnj->mni', A,B)
Check with
np.allclose(np.einsum('mnij,mnj->mni', A,x), b)
returns True
CodePudding user response:
The same result can be achieved with broadcasting:
x1 = x.reshape(M,N,d,1)
b1 = np.matmul(A,x1)
b2 = b1.reshape(M,N,d)