I am trying to train a classifier CNN with 3 classes. I am trying to troubleshoot my loss function. I am testing tf.keras.losses.CategoricalCrossentropy()
and tf.keras.losses.categorical_crossentropy().numpy()
. I am following the standealone usage guide from the tensorflow documentation.
I think that I am not getting the proper outputs that I should be.
When I input y_true=[0.,1.,0.]
and y_pred=[1.,0.,0.]
I expect a loss of infinity (output in the program: nan
). However, the output I receive is 16.118095
. When the classification aligns with the label (i.e. y_true=[1.,0.,0.]
and y_pred=[1.,0.,0.]
) the output is 1.192093e-07
, even though I would expect a perfect 0.
I am really perplexed by this behavior. Similarly, with the 1 long vector case: y_true=[1.]
and y_pred=[0.]
the loss is 16.118095
, and likewise when the classification aligns y_true=[1.]
and y_pred=[1.]
I receive 1.192093e-07
and y_true=[0.]
and y_pred=[0.]
the result is nan
.
I think that summarizing the results I get, the results I expect, and the values I am inputting into the loss functions would make things more readable so I will do that below:
y_true |
y_pred |
Actual Output | What I Expect |
---|---|---|---|
[0.,1.,0.] |
[1.,0.,0.] |
16.118095 | nan or infinity |
[1.,0.,0.] |
[1.,0.,0.] |
1.192093e-07 | True 0 |
[0.,1.] |
[1.,0.] |
16.118095 | nan or infinity |
[1.,0.] |
[1.,0.] |
1.192093e-07 | True 0 |
[1.] |
[0] |
nan or infinity |
nan or infinity |
[1.] |
[1.] |
1.192093e-07 | True 0 |
I am sorry if this is a trivial question, but I really don't know why I am getting the results that I am getting. I think something is wrong because I am only getting 16 and not infinity, but if nothing is going wrong I'd like the reassurance. If I am wrong, I would really appreciate the correction.
CodePudding user response:
The reason is that tf.keras.losses.categorical_crossentropy
applies a small offset (1e-7
) to y_pred
when it's equal to one or zero, that's why in your case you don't see the output that you expect.
import tensorflow as tf
def categorical_crossentropy(y_true, y_pred, clip=False):
if clip == True:
y_pred = tf.clip_by_value(y_pred, 1e-7, 1 - 1e-7)
return - tf.experimental.numpy.nansum(y_true * tf.math.log(y_pred))
y_true = [0., 1., 0.]
y_pred = [1., 0., 0.]
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
# 16.118095
print(categorical_crossentropy(y_true, y_pred, clip=True).numpy())
# 16.118095
print(categorical_crossentropy(y_true, y_pred, clip=False).numpy())
# inf
y_true = [1., 0., 0.]
y_pred = [1., 0., 0.]
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
# 1.1920929e-07
print(categorical_crossentropy(y_true, y_pred, clip=True).numpy())
# 1.1920929e-07
print(categorical_crossentropy(y_true, y_pred, clip=False).numpy())
# -0.0
y_true = [0., 1., 0.]
y_pred = [0.05, 0.95, 0.]
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
# 0.051293306
print(categorical_crossentropy(y_true, y_pred, clip=True).numpy())
# 0.051293306
print(categorical_crossentropy(y_true, y_pred, clip=False).numpy())
# 0.051293306