Home > Back-end >  Method with numpy gives different result when called with array
Method with numpy gives different result when called with array

Time:12-10

I created a cosine similarity method, which gives the correct results when called with indivdual vectors, but when I supply a list of vectors I suddenly get different results. Isn't numpy supposed to calculate the formula for every element in the list? Is my understanding wrong?

Cosine similarity:

def cosine_similarity(vec1, vec2):
  return np.inner(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

Example:

a = [1, 2, 3]
b = [4, 5, 6]
print(cosine_similarity(a, a), cosine_similarity(a, b), cosine_similarity(a, [a, b]))

With the result:

1.0 0.9746318461970762 [0.39223227 0.8965309 ]

The first two values are correct, the array of values should be the same, but isn't. Is this just not possible or do I have to change something?

CodePudding user response:

Your understanding is actually correct. Many functions in numpy allow the keyword argument axis to be specified on call. np.linalg.norm for example computes the norm along the specified axis. In your case, if it is not specified, norm calulates the norm of the 2x3 matrix [a, b] instead calculating the norm per row. To fix the code just do the following:

def cosine_similarity(vec1, vec2):
  return np.inner(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2, axis=-1))
  • Related