Home > database >  More efficient dot-product with 'numpy.ndarray'
More efficient dot-product with 'numpy.ndarray'

Time:10-30

My user_factors and item_factors are of numpy.ndarray type. I want to efficiently calculate dot-product between a row of user_factors and each row of item_factors, and store the result in a numpy.ndarray. This is how I do it:

import numpy as np

...

user_coordinates = user_factors[user_id]

scores = []

for i in range(len(item_factors)):
    item_coordinates = item_factors[i]
    score = sum(i[0] * i[1] for i in zip(user_coordinates, item_coordinates))
    scores.append(score)


np_scores = np.hstack(scores)

Could this be done more efficiently (perhaps without the list to numpy.ndarray conversion at the end)?

CodePudding user response:

dot products between i-th row of A and j_th row of B are given in C_{ij}:

A=np.random.randint(10, size=(3,3))
B=np.random.randint(10, size=(3,3))
C=np.dot(A,B.T)

If you want to calculate the dot product between a specific row of A, say 2nd, and all the rows of B, then:

C = np.dot(A[1,:], B.T)

CodePudding user response:

Let's assume some data

np.random.seed(0)
user_coordinates = np.random.randint(0, 10, (10,))
item_factors = np.random.randint(0, 10, (15, 10))

Your result will then be

>>> np_scores
array([220, 156, 133, 178, 205, 269, 164, 184,  98, 159, 103, 182, 194,
       157, 188])

You can achieve the same in a one-liner straight with numpy:

>>> item_factors.dot(user_coordinates)
array([220, 156, 133, 178, 205, 269, 164, 184,  98, 159, 103, 182, 194,
       157, 188])

The documentation of numpy.dot says:

  • If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
  • Related