Let's assume we have two numpy arrays A (n1xm) and B (n2xm) and I want to apply a certain mathematical operation between the rows of both tables.
For example, let's say that we want to calculate the Euclidean distance between each row of A and each row of B and store it at a new numpy table C (n1xn2).
The simple for-loop approach would be something like the following:
C = np.zeros((A.shape[0],B.shape[0]))
for i in range(A.shape[0]):
for j in range(B.shape[0]):
C[i,j] = np.linalg.norm(x[i]-y[j])
However, the above implementation is not the most efficient. How could I write this differently by using vectorization to speed up the implementation ?
CodePudding user response:
You can broadcast over a new axis:
# n1 x m x n2
diff = A[:, :, None] - B[:, :, None].T
# n1 x n2 after summing across m
dists = np.sqrt((diff * diff).sum(1))