I am using custom Recall and Precision metrics in my model. I know they have them built into Keras but I only care about one of the classes.
As I begin an epoch, I get values printing out for the metrics but after many steps one metrics returns NaN and a few hundred epochs later the second custom metric shows NaN.
The recall metric is written in the same
def precision(y_true, y_pred):
'''
Calculates precision metric over gun label
Precision = TP/(TP FP)
'''
#I only care about the last label
y_true = y_true[:,-1]
y_pred = y_pred[:,-1]
y_pred = tf.where(y_pred>.5, 1, 0)
y_pred = tf.cast(y_pred, tf.float32)
y_true = tf.cast(y_true, tf.float32)
true_positives = K.sum(y_true * y_pred)
false_positive = tf.math.reduce_sum(tf.where(tf.logical_and(tf.not_equal(y_true,y_pred), y_pred==1), 1, 0))
false_positive = tf.cast(false_positive, tf.float32)
precision = true_positives / (true_positives false_positive)
return precision
Training a multi label so my last dense layer is preds = Dense(num_classes, activation='sigmoid', name='Classifier')(x)
.
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy', precision, recall])
model.fit(train_ds, steps_per_epoch=10000, validation_data=valid_ds, validation_steps=1181, epochs=200)
18/10000 [............] - ETA: 6:43 - loss: 0.6919 - accuracy: 0.0046 - precision: 0.2597 - recall: 0.4691
315/10000 [...........] - ETA: 7:56 - loss: 0.4174 - accuracy: 0.1145 - precision: nan - recall: 0.6115
10000/10000 [=========>] - ETA: 0s - loss: 0.0797 - accuracy: 0.5432 - precision: nan - recall: nan
10000/10000 [=========>] - 576s 56ms/step - loss: 0.0797 - accuracy: 0.5432 - precision: nan - recall: nan - val_loss: 0.0557 - val_accuracy: 0.5807 - val_precision: 0.9698 - val_recall: 0.9529
At the beginning of each epoch, the metrics show numbers again but after many steps they go back to NaN. With observation, I can confirm they do not go to 0 nor 1 right before NaN.
CodePudding user response:
The issue was a divide by zero. I added a small value in each denominator which solves the problem. This occurs if there are no positive predictions by the network in any batch. This is why it occurred intermittently.
import tensorflow.keras.backend as K
precision = true_positives / (true_positives false_positive K.epsilon())