I have written a custom loss function that returns a loss of 0 when the ground truth labels (6d vector) are NaN and otherwise returns the mean squared error. Either all 6 features in the label are NaN, or there are no NaNs.
my loss function looks like:
tf.reduce_mean(tf.where(tf.math.is_nan(true_labels), tf.zeros_like(true_labels),
tf.square(tf.subtract(true_labels, predicted_labels))))
where true_labels and predicted_labels have shape (batch_size, 6), and only entire rows of either matrix can be NaN. I get NaN loss values in this case, even though I should be returning 0 for the loss when thr ground truth is NaN. I have also tested this issue with a work around by replacing all the NaN values with a large negative number (-1e4, which is outside the range of my data) during preprocessing, and then testing for NaNs in my loss function by using
tf.where(tf.math.less(true_labels, -9999), tf.zeros_like(true_labels),
tf.square(tf.subtract(true_labels, predicted_labels)))
This is a total hack, but works nonetheless. Therefore, I believe the issue is with the tf.math.is_nan function, but I have no idea why it gives my NaN losses. Furthermore, I have tested the loss function outside of training mode on some labels I made artificially, and it does not return NaNs then. Any help is appreciated.
This is my model below. It returns a (batch_size, 6) shaped Tensor. The first column is sigmoid activated to lie in [0,1] and is fed into a binary cross entropy loss function (that I did not include here, but confirmed that the NaN is not coming from the binary loss). The remaining 5 columns are fed into the custom loss function defined above.
def custom_activation(tensor):
first_node_sigmoid = tf.nn.sigmoid(tensor[:, :1])
return tf.concat([first_node_sigmoid, tensor[:, 1:]], axis = 1)
def gen_model():
IMAGE_SIZE = 200
CONV_PARAMS = {"kernel_size": 3, "use_bias": False, "padding": "same"}
CONV_PARAMS2 = {"kernel_size": 5, "use_bias": False, "padding": "same"}
model = Sequential()
model.add(
Reshape((IMAGE_SIZE, IMAGE_SIZE, 1), input_shape=(IMAGE_SIZE, IMAGE_SIZE))
)
model.add(Conv2D(16, **CONV_PARAMS))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Conv2D(32, **CONV_PARAMS))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Conv2D(64, **CONV_PARAMS))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(64, **CONV_PARAMS))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(64, **CONV_PARAMS2))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Conv2D(128, **CONV_PARAMS2))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Conv2D(128, **CONV_PARAMS2))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(64))
model.add(Dense(6))
model.add(tf.keras.layers.Lambda(custom_activation, name = "final_activation_layer"))
return model
Here is an example of what the ground truth label looks like when the first feature is True (1):
[ 1. 106. 189. 2.64826314 19.
26.44962941]
When the first feature is False (0), the label is
[0, nan, nan, nan, nan, nan]
Update:
After some debugging with tf.print statements, I found that my 'predicted_labels' are coming out as all NaN values. This issue does not occur when I use the 'hack' described above, so I don't think it is an issue wiht my data. I also checked that none of my images contain any NaNs after preprocessing when used as input to the network. Somehow, with the loss function described above, I get NaNs in my predicted values, but I have no idea why. I have tried lowering learning rate and batch size, but this does not help.
CodePudding user response:
Maybe something like the following could work for you. All nan
elements are first converted to 0, while the rest remain elements stay the same. For example, [0, np.nan, np.nan, np.nan, np.nan, np.nan]
results in [0, 0, 0, 0, 0, 0]
while [1., 106., 189., 2.64826314, 19., 26.44962941]
remains untouched. Afterwards, your loss is only calculated for non-zero values. If true_labels
are zero, then you just return 0.
import tensorflow as tf
import numpy as np
def custom_loss(true_labels, predicted_labels):
true_labels = tf.where(tf.math.is_nan(true_labels), tf.zeros_like(true_labels), true_labels)
loss = tf.reduce_mean(
tf.where(tf.equal(true_labels, 0.0), true_labels,
tf.square(tf.subtract(true_labels, predicted_labels))))
return loss
def custom_activation(tensor):
first_node_sigmoid = tf.nn.sigmoid(tensor[:, :1])
return tf.concat([first_node_sigmoid, tensor[:, 1:]], axis = 1)
def gen_model():
IMAGE_SIZE = 200
CONV_PARAMS = {"kernel_size": 3, "use_bias": False, "padding": "same"}
CONV_PARAMS2 = {"kernel_size": 5, "use_bias": False, "padding": "same"}
model = tf.keras.Sequential()
model.add(
tf.keras.layers.Reshape((IMAGE_SIZE, IMAGE_SIZE, 1), input_shape=(IMAGE_SIZE, IMAGE_SIZE))
)
model.add(tf.keras.layers.Conv2D(16, **CONV_PARAMS))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Conv2D(32, **CONV_PARAMS))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Conv2D(64, **CONV_PARAMS))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.Conv2D(64, **CONV_PARAMS))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.Conv2D(64, **CONV_PARAMS2))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Conv2D(128, **CONV_PARAMS2))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Conv2D(128, **CONV_PARAMS2))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(64))
model.add(tf.keras.layers.Dense(6))
model.add(tf.keras.layers.Lambda(custom_activation, name = "final_activation_layer"))
return model
Y_train = tf.constant([[1., 106., 189., 2.64826314, 19., 26.44962941],
[0, np.nan, np.nan, np.nan, np.nan, np.nan]])
model = gen_model()
model.compile(loss=custom_loss, optimizer=tf.keras.optimizers.Adam())
model.fit(tf.random.normal((2, 200, 200)), Y_train, epochs=4)
Epoch 1/4
1/1 [==============================] - 1s 1s/step - loss: 4112.9380
Epoch 2/4
1/1 [==============================] - 0s 30ms/step - loss: 947.3030
Epoch 3/4
1/1 [==============================] - 0s 25ms/step - loss: 25.8993
Epoch 4/4
1/1 [==============================] - 0s 24ms/step - loss: 217.2151
<keras.callbacks.History at 0x7f8490b8db90>