The triplet loss is defined as follows:
L(A, P, N) = max(‖f(A) - f(P)‖² - ‖f(A) - f(N)‖² margin, 0)
where A=anchor
, P=positive
, and N=negative
are the data samples in the loss, and margin
is the minimum distance between the anchor and positive/negative samples.
I read somewhere that (1 - cosine_similarity)
may be used instead of the L2 distance
.
Note that I am using Tensorflow
- and the cosine similarity loss is defined that When it is a negative number between -1 and 0, 0 indicates orthogonality and values closer to -1 indicate greater similarity. The values closer to 1 indicate greater dissimilarity. So, it is the opposite of cosine similarity metric.
Another resource I found is the cosine similarity layer here, but it is not a triplet loss.
Any suggestions on how to write my triplet loss with cosine similarity?
CodePudding user response:
First of all, Cosine_distance = 1 - cosine_similarity
. The distance and similarity are different. This is not correctly mentioned in some of the answers!
Secondly, you should look at the TensorFlow code on how the cosine similarity loss
is implemented https://github.com/keras-team/keras/blob/v2.9.0/keras/losses.py#L2202-L2272, which is different from PyTorch!!
Finally, I suggest you use existing loss: You should replace the || ... ||^2
with tf.losses.cosineDistance(...)
.
CodePudding user response:
I am guessing that what you red about replacing L2 with cosine origins from the definition of cosine between two vectors:
cos(f(A), f(P)) = f(A) * f(P)/(‖f(A)‖*‖f(P)‖)
where dot product along the feature dimension is implied in the above. Next, note that
[1 - cos(f(A), f(P))]*‖f(A)‖*‖f(P)‖ = ‖f(A) - f(P)‖² - (‖f(A)‖ - ‖f(P)‖)²
which gives a hint on where the notion comes from when ‖f(A)‖ = ‖f(P)‖
. So your formula can be naturally changed to
L(A, P, N) = max(cos(f(A), f(N)) - cos(f(A), f(P)) margin, 0)
Your margin
parameter should be adjusted accordingly. Here is some Tensorflow code to compute the cosines for vectors
def cos(A, B):
return tf.reduce_sum(A*B, axis=-1)/tf.norm(A, axis=-1)/tf.norm(B, axis=-1)
Whenever this loss would benefit your particular problem depends on the problem, so good luck with your experiments.