To be clear, below is what I am trying to do. And the question is, how can I change the function oper_AB()
so that instead of the nested for loop, I utilize the vectorization/broadcasting in numpy and get to the ret_list
much faster?
def oper(a_1D, b_1D):
return np.dot(a_1D, b_1D) / np.dot(b_1D, b_1D)
def oper_AB(A_2D, B_2D):
ret_list = []
for a_1D in A_2D:
for b_1D in B_2D:
ret_list.append(oper(a_1D, b_1D))
return ret_list
CodePudding user response:
Strictly addressing the question (with the reservation that I suspect the OP wants the norm, not the norm squared, as divisor below):
r = a @ b.T / np.linalg.norm(b, axis=1)**2
Example:
np.random.seed(0)
a = np.random.randint(0, 10, size=(2,2))
b = np.random.randint(0, 10, size=(2,2))
Then:
>>> a
array([[5, 0],
[3, 3]])
>>> b
array([[7, 9],
[3, 5]])
>>> oper_AB(a, b)
[0.2692307692307692,
0.4411764705882353,
0.36923076923076925,
0.7058823529411765]
>>> a @ b.T / np.linalg.norm(b, axis=1)**2
array([[0.26923077, 0.44117647],
[0.36923077, 0.70588235]])
>>> np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2)
array([0.26923077, 0.44117647, 0.36923077, 0.70588235])
Speed:
n, m = 1000, 100
a = np.random.uniform(size=(n, m))
b = np.random.uniform(size=(n, m))
orig = %timeit -o oper_AB(a, b)
# 2.73 s ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
new = %timeit -o np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2)
# 2.22 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
orig.average / new.average
# 1228.78 (speedup)
Our solution is 1200x faster than the original.
Correctness:
>>> np.allclose(np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2), oper_AB(a, b))
True
Speed on large array, comparison to @Ahmed AEK's solution:
n, m = 2000, 2000
a = np.random.uniform(size=(n, m))
b = np.random.uniform(size=(n, m))
new = %timeit -o np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2)
# 86.5 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
other = %timeit -o AEK(a, b) # Ahmed AEK's answer
# 102 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Our solution is 15% faster :-)
CodePudding user response:
this should work.
result = (np.matmul(A_2D, B_2D.transpose())/np.sum(B_2D*B_2D,axis=1)).flatten()
but this second implementation will be faster because of cache utilization.
def oper_AB(A_2D, B_2D):
b_squared = np.sum(B_2D*B_2D,axis=1).reshape([-1,1])
b_normalized = B_2D/b_squared
del b_squared
returned_val = np.matmul(A_2D,b_normalized.transpose())
return returned_val.flatten()
the del
is there just if the memory allocated by B_2D is too big, (or it's just me being used to working with multiple GB arrays)