I created a cosine similarity method, which gives the correct results when called with indivdual vectors, but when I supply a list of vectors I suddenly get different results. Isn't numpy supposed to calculate the formula for every element in the list? Is my understanding wrong?
Cosine similarity:
def cosine_similarity(vec1, vec2):
return np.inner(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
Example:
a = [1, 2, 3]
b = [4, 5, 6]
print(cosine_similarity(a, a), cosine_similarity(a, b), cosine_similarity(a, [a, b]))
With the result:
1.0 0.9746318461970762 [0.39223227 0.8965309 ]
The first two values are correct, the array of values should be the same, but isn't. Is this just not possible or do I have to change something?
CodePudding user response:
Your understanding is actually correct. Many functions in numpy allow the keyword argument axis
to be specified on call. np.linalg.norm
for example computes the norm along the specified axis. In your case, if it is not specified, norm
calulates the norm of the 2x3 matrix [a, b]
instead calculating the norm per row.
To fix the code just do the following:
def cosine_similarity(vec1, vec2):
return np.inner(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2, axis=-1))