Home > database >  Tensorflow NaN loss during training with Tf.math.is_nan function
Tensorflow NaN loss during training with Tf.math.is_nan function

Time:11-17

I have written a custom loss function that returns a loss of 0 when the ground truth labels (6d vector) are NaN and otherwise returns the mean squared error. Either all 6 features in the label are NaN, or there are no NaNs.

my loss function looks like:

tf.reduce_mean(tf.where(tf.math.is_nan(true_labels), tf.zeros_like(true_labels),
tf.square(tf.subtract(true_labels, predicted_labels))))

where true_labels and predicted_labels have shape (batch_size, 6), and only entire rows of either matrix can be NaN. I get NaN loss values in this case, even though I should be returning 0 for the loss when thr ground truth is NaN. I have also tested this issue with a work around by replacing all the NaN values with a large negative number (-1e4, which is outside the range of my data) during preprocessing, and then testing for NaNs in my loss function by using

tf.where(tf.math.less(true_labels, -9999), tf.zeros_like(true_labels),
tf.square(tf.subtract(true_labels, predicted_labels)))

This is a total hack, but works nonetheless. Therefore, I believe the issue is with the tf.math.is_nan function, but I have no idea why it gives my NaN losses. Furthermore, I have tested the loss function outside of training mode on some labels I made artificially, and it does not return NaNs then. Any help is appreciated.

This is my model below. It returns a (batch_size, 6) shaped Tensor. The first column is sigmoid activated to lie in [0,1] and is fed into a binary cross entropy loss function (that I did not include here, but confirmed that the NaN is not coming from the binary loss). The remaining 5 columns are fed into the custom loss function defined above.

def custom_activation(tensor):
    first_node_sigmoid = tf.nn.sigmoid(tensor[:, :1])
    return tf.concat([first_node_sigmoid, tensor[:, 1:]], axis = 1)


def gen_model():
    IMAGE_SIZE = 200
    CONV_PARAMS = {"kernel_size": 3, "use_bias": False, "padding": "same"}
    CONV_PARAMS2 = {"kernel_size": 5, "use_bias": False, "padding": "same"}

    model = Sequential()
    model.add(
        Reshape((IMAGE_SIZE, IMAGE_SIZE, 1), input_shape=(IMAGE_SIZE, IMAGE_SIZE))
    )
    model.add(Conv2D(16, **CONV_PARAMS))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(MaxPool2D())
    model.add(Conv2D(32, **CONV_PARAMS))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(MaxPool2D())
    model.add(Conv2D(64, **CONV_PARAMS))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(Conv2D(64, **CONV_PARAMS))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(Conv2D(64, **CONV_PARAMS2))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(MaxPool2D())
    model.add(Conv2D(128, **CONV_PARAMS2))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(MaxPool2D())
    model.add(Conv2D(128, **CONV_PARAMS2))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(MaxPool2D())
    model.add(Flatten())
    model.add(Dense(64))
    model.add(Dense(6))
    model.add(tf.keras.layers.Lambda(custom_activation, name = "final_activation_layer"))
    return model

Here is an example of what the ground truth label looks like when the first feature is True (1):

 [  1.         106.         189.           2.64826314  19.
   26.44962941]

When the first feature is False (0), the label is

[0, nan, nan, nan, nan, nan]

Update:

After some debugging with tf.print statements, I found that my 'predicted_labels' are coming out as all NaN values. This issue does not occur when I use the 'hack' described above, so I don't think it is an issue wiht my data. I also checked that none of my images contain any NaNs after preprocessing when used as input to the network. Somehow, with the loss function described above, I get NaNs in my predicted values, but I have no idea why. I have tried lowering learning rate and batch size, but this does not help.

CodePudding user response:

Maybe something like the following could work for you. All nan elements are first converted to 0, while the rest remain elements stay the same. For example, [0, np.nan, np.nan, np.nan, np.nan, np.nan] results in [0, 0, 0, 0, 0, 0] while [1., 106., 189., 2.64826314, 19., 26.44962941] remains untouched. Afterwards, your loss is only calculated for non-zero values. If true_labels are zero, then you just return 0.

import tensorflow as tf
import numpy as np

def custom_loss(true_labels, predicted_labels):

   true_labels = tf.where(tf.math.is_nan(true_labels), tf.zeros_like(true_labels), true_labels)
   loss = tf.reduce_mean(
       tf.where(tf.equal(true_labels, 0.0), true_labels,
       tf.square(tf.subtract(true_labels, predicted_labels))))
   return loss

def custom_activation(tensor):
    first_node_sigmoid = tf.nn.sigmoid(tensor[:, :1])
    return tf.concat([first_node_sigmoid, tensor[:, 1:]], axis = 1)


def gen_model():
    IMAGE_SIZE = 200
    CONV_PARAMS = {"kernel_size": 3, "use_bias": False, "padding": "same"}
    CONV_PARAMS2 = {"kernel_size": 5, "use_bias": False, "padding": "same"}

    model = tf.keras.Sequential()
    model.add(
        tf.keras.layers.Reshape((IMAGE_SIZE, IMAGE_SIZE, 1), input_shape=(IMAGE_SIZE, IMAGE_SIZE))
    )
    model.add(tf.keras.layers.Conv2D(16, **CONV_PARAMS))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.MaxPool2D())
    model.add(tf.keras.layers.Conv2D(32, **CONV_PARAMS))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.MaxPool2D())
    model.add(tf.keras.layers.Conv2D(64, **CONV_PARAMS))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.Conv2D(64, **CONV_PARAMS))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.Conv2D(64, **CONV_PARAMS2))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.MaxPool2D())
    model.add(tf.keras.layers.Conv2D(128, **CONV_PARAMS2))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.MaxPool2D())
    model.add(tf.keras.layers.Conv2D(128, **CONV_PARAMS2))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.MaxPool2D())
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(64))
    model.add(tf.keras.layers.Dense(6))
    model.add(tf.keras.layers.Lambda(custom_activation, name = "final_activation_layer"))
    return model

Y_train = tf.constant([[1., 106., 189., 2.64826314, 19., 26.44962941], 
                       [0, np.nan, np.nan, np.nan, np.nan, np.nan]])
model = gen_model()
model.compile(loss=custom_loss, optimizer=tf.keras.optimizers.Adam())
model.fit(tf.random.normal((2, 200, 200)), Y_train, epochs=4)
Epoch 1/4
1/1 [==============================] - 1s 1s/step - loss: 4112.9380
Epoch 2/4
1/1 [==============================] - 0s 30ms/step - loss: 947.3030
Epoch 3/4
1/1 [==============================] - 0s 25ms/step - loss: 25.8993
Epoch 4/4
1/1 [==============================] - 0s 24ms/step - loss: 217.2151
<keras.callbacks.History at 0x7f8490b8db90>
  • Related