Is it correct to combine losses from different layers in Tensorflow?-CodePudding

Is it possible to combine the loss from different layers? Say I have one model with five layers.

The first loss (whatever it may be) is computed from the outputs of the third layer with some labels.

The second loss (say, your typical cross-entropy loss) is computed from the outputs of the last layer with your classification labels.

An example with codes is as below:

def get_intermediate_model(base_model, intermediate_layer):
    model = tf.keras.Model(inputs=base_model.inputs, outputs=base_model.layers[intermediate_layer-1].output)
    return model

def gradientCalculation(fullModel, inputData, intermediateLabels,labels):
    intermediateModel = get_intermediate_model(fullModel,3)

    with tf.GradientTape() as tape:
        intermediateOutput = intermediateModel(inputData,training=True)
        classifierOutput = fullModel(inputData,training=True)

        intermediate_layer_loss = anyLossFunction(intermediateLabels,intermediateOutput)
        classifier_loss = tf.keras.losses.CategoricalCrossentropy(classifierOutput,labels)
        
        combinedFinalLoss = classifier_loss   (0.2 * intermediate_layer_loss )

    gradients = tape.gradient(loss, fullModel.trainable_variables)
    return gradients

as you can see, the gradient is generated from the summed loss with the fullModel rather than from intermediateModel.

Is this sort of operation correct? Can tf.GradientTape() keep track of the loss that was computed from the intermediate layers and can compute the gradient accordingly?

Or are we just adding more losses that the entire model is optimizing?

(The desired results would be the first 3 layers of the fullModel would receive 2 gradients summed together, one gradient from the 1st loss that has been reduced with a coefficient (0.2) and the 2nd from the backpropagation due to the cross entropy loss for all the layers. The 4th and 5th layer only receives gradients from the cross entropy loss)

CodePudding user response：

In your code tf.GradientTape() keeps indeed track of the several losses and will compute the gradient accordingly.

There is a problem in your code because the same data goes twice through the network:

intermediateOutput = intermediateModel(inputData,training=True)
classifierOutput = fullModel(inputData,training=True)

From the computational point view it is wrong.
You could define a single model that outputs both tensors :

model = tf.keras.Model(inputs=base_model.inputs, outputs=[base_model.layers[intermediate_layer-1].output, base_model.output])

Note that you only need to set tf.GradientTape(persistent=True) if you call tape.gradient multiple times (in GANs for instance).