Given two 2D numpy arrays A and B, how to efficiently apply a function that takes two 1D arrays to e-CodePudding

To be clear, below is what I am trying to do. And the question is, how can I change the function oper_AB() so that instead of the nested for loop, I utilize the vectorization/broadcasting in numpy and get to the ret_list much faster?

def oper(a_1D, b_1D):
    return np.dot(a_1D, b_1D) / np.dot(b_1D, b_1D)

def oper_AB(A_2D, B_2D):
    ret_list = []
    for a_1D in A_2D:
        for b_1D in B_2D:
            ret_list.append(oper(a_1D, b_1D))
    return ret_list

CodePudding user response：

Strictly addressing the question (with the reservation that I suspect the OP wants the norm, not the norm squared, as divisor below):

r = a @ b.T / np.linalg.norm(b, axis=1)**2

Example:

np.random.seed(0)
a = np.random.randint(0, 10, size=(2,2))
b = np.random.randint(0, 10, size=(2,2))

Then:

>>> a
array([[5, 0],
       [3, 3]])

>>> b
array([[7, 9],
       [3, 5]])

>>> oper_AB(a, b)
[0.2692307692307692,
 0.4411764705882353,
 0.36923076923076925,
 0.7058823529411765]

>>> a @ b.T / np.linalg.norm(b, axis=1)**2
array([[0.26923077, 0.44117647],
       [0.36923077, 0.70588235]])

>>> np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2)
array([0.26923077, 0.44117647, 0.36923077, 0.70588235])

Speed:

n, m = 1000, 100
a = np.random.uniform(size=(n, m))
b = np.random.uniform(size=(n, m))

orig = %timeit -o oper_AB(a, b)
# 2.73 s ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

new = %timeit -o np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2)
# 2.22 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

orig.average / new.average
# 1228.78 (speedup)

Our solution is 1200x faster than the original.

Correctness:

>>> np.allclose(np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2), oper_AB(a, b))
True

Speed on large array, comparison to @Ahmed AEK's solution:

n, m = 2000, 2000
a = np.random.uniform(size=(n, m))
b = np.random.uniform(size=(n, m))

new = %timeit -o np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2)
# 86.5 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
other = %timeit -o AEK(a, b)  # Ahmed AEK's answer
# 102 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Our solution is 15% faster :-)

CodePudding user response：

this should work.

result = (np.matmul(A_2D, B_2D.transpose())/np.sum(B_2D*B_2D,axis=1)).flatten()

but this second implementation will be faster because of cache utilization.

def oper_AB(A_2D, B_2D):
    b_squared = np.sum(B_2D*B_2D,axis=1).reshape([-1,1])
    b_normalized = B_2D/b_squared
    del b_squared
    returned_val = np.matmul(A_2D,b_normalized.transpose())
    return returned_val.flatten()

the del is there just if the memory allocated by B_2D is too big, (or it's just me being used to working with multiple GB arrays)