I am trying to write a custom loss function for a CNN that detects an object and draws a non axis aligned bounding box on it. My inputs are 200x200 images and the outputs / labels are 6d vectors of the form
[object_present, x,y, angle, width, height]
Where object_present is a binary feature representing whether the object is present or not, (x,y) is the centre of the bounding box, angle is the angle of rotation of the bbox from being axis aligned, and width and height are the dimensions of the bbox. When object_present = 0, all the other features are set to NaN.
Consequently, my custom loss function needs to ignore the NaNs for a negative sample and apply Binary Cross entropy loss to object_present feature. For positive samples, I also have to include MSE loss for (x,y) and width, height, and an angular regression loss that I have defined as arctan(sin(angle1 - angle2), cos(angle1-angle2)). My implementation is as follows:
binary_loss_func = tf.keras.losses.BinaryCrossentropy()
def loss_func(true_labels, pred_labels):
binary_loss = binary_loss_func(true_labels[:,0], pred_labels[:, 0])
mse_loss1 = tf.reduce_mean(tf.where(tf.math.is_nan(true_labels[:,1:3]), tf.zeros_like(true_labels[:, 1:3]),
tf.square(tf.subtract(true_labels[:, 1:3], pred_labels[:, 1:3]))))
mse_loss2 = tf.reduce_mean(tf.where(tf.math.is_nan(true_labels[:,4:]),
tf.zeros_like(true_labels[:, 4:]), tf.square(tf.subtract(true_labels[:, 4:], pred_labels[:, 4:]))))
angular_loss = tf.reduce_mean(tf.where(is_nan(true_labels[:,3]), tf.zeros_like(true_labels[:, 3]),
tf.abs(tf.atan2(tf.sin(true_labels[:, 3] - pred_labels[:, 3]), tf.cos(true_labels[:, 3] - pred_labels[:, 3])))))
return mse_loss1 mse_loss2 binary_loss angular_loss
My issue is that this returns NaN loss values after the first batch of training (only the first batch does not give NaN loss), even though I think the above code should return 0 loss for negative samples. I have confirmed that the Binary Loss function is returning real numbers as it should, so the issue is with the other components of the loss. After some debugging with tf.print statements, I found that pred_labels becomes NaN after the first batch of training. I am not sure why this is happening, and if it is an issue with how my custom loss function is defined or if it is an issue with my model. The model I am using is:
IMAGE_SIZE = 200
CONV_PARAMS = {"kernel_size": 3, "use_bias": False, "padding": "same"}
CONV_PARAMS2 = {"kernel_size": 5, "use_bias": False, "padding": "same"}
model = Sequential()
model.add(
Reshape((IMAGE_SIZE, IMAGE_SIZE, 1), input_shape=(IMAGE_SIZE, IMAGE_SIZE))
)
model.add(Conv2D(16, **CONV_PARAMS))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Conv2D(32, **CONV_PARAMS))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Conv2D(64, **CONV_PARAMS))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(6))
CodePudding user response:
You seem to still be calculating your loss with nan
values, although you are trying to avoid it. Maybe try something like this:
binary_loss_func = tf.keras.losses.BinaryCrossentropy()
def loss_func(true_labels, pred_labels):
true_labels = tf.where(tf.math.is_nan(true_labels), tf.zeros_like(true_labels), true_labels)
condition = tf.equal(true_labels, 0.0)
binary_loss = tf.where(condition, tf.reduce_mean(true_labels), binary_loss_func(true_labels[:,0], pred_labels[:, 0]))
mse_loss1 = tf.reduce_mean(tf.where(tf.equal(true_labels[:, 1:3], 0.0), true_labels[:, 1:3],
tf.square(tf.subtract(true_labels[:, 1:3], pred_labels[:, 1:3]))))
mse_loss2 = tf.reduce_mean(tf.where(tf.equal(true_labels[:, 4:], 0.0), true_labels[:, 4:],
tf.square(tf.subtract(true_labels[:, 4:], pred_labels[:, 4:]))))
angular_loss = tf.reduce_mean(tf.where(tf.equal(true_labels[:, 3], 0.0), true_labels[:, 3],
tf.abs(tf.atan2(tf.sin(true_labels[:, 3] - pred_labels[:, 3]), tf.cos(true_labels[:, 3] - pred_labels[:, 3])))))
return mse_loss1 mse_loss2 binary_loss angular_loss