Tensorflow 2.0: compute gradients across 2 models-CodePudding

Let's assume that we are building a basic CNN that recognizes pictures of cats and dogs (binary classifier).

An example of such CNN can be as follows:

model = Sequential([
  Conv2D(32, (3,3), input_shape=...),
  Activation('relu'),
  MaxPooling2D(pool_size=(2,2),

  Conv2D(32, (3,3), input_shape=...),
  Activation('relu'),
  MaxPooling2D(pool_size=(2,2)

  Conv2D(64, (3,3), input_shape=...),
  Activation('relu'),
  MaxPooling2D(pool_size=(2,2),

  Flatten(),
  Dense(64),
  Activation('relu'),
  Dropout(0.5),
  Dense(1),
  Activation('sigmoid')
])

Let's also assume that we want to have the model split into two parts, or two models, called model_0 and model_1.

model_0 will handle the input, and model_1 will take model_0 output and take it as an input.

For example, the previous model will become:

model_0 = Sequential([
  Conv2D(32, (3,3), input_shape=...),
  Activation('relu'),
  MaxPooling2D(pool_size=(2,2),

  Conv2D(32, (3,3), input_shape=...),
  Activation('relu'),
  MaxPooling2D(pool_size=(2,2)

  Conv2D(64, (3,3), input_shape=...),
  Activation('relu'),
  MaxPooling2D(pool_size=(2,2)
])

model_1 = Sequential([
  Flatten(),
  Dense(64),
  Activation('relu'),
  Dropout(0.5),
  Dense(1),
  Activation('sigmoid')
])

How do I train the two models as if they were one single model? I have tried to manually set the gradients, but I don't understand how to pass the gradients from model_1 to model_0:

for epoch in range(epochs):
    for step, (x_batch, y_batch) in enumerate(train_generator):

        # model 0
        with tf.GradientTape() as tape_0:
            y_pred_0 = model_0(x_batch, training=True)

        # model 1
        with tf.GradientTape() as tape_1:
            y_pred_1 = model_1(y_pred_0, training=True)

            loss_value = loss_fn(y_batch_tensor, y_pred_1)

        grads_1 = tape_1.gradient(y_pred_1, model_1.trainable_weights)
        grads_0 = tape_0.gradient(y_pred_0, model_0.trainable_weights)
        optimizer.apply_gradients(zip(grads_1, model_1.trainable_weights))
        optimizer.apply_gradients(zip(grads_0, model_0.trainable_weights))

This method will of course not work, as I am basically just training two models separately and binding them up, which is not what I want to achieve.

EDIT 1: Please note that I am aware of Sequential([model_0, model_1]), but this is not what I want to achieve. I want to do the backpropagation step manually.

Any clues?

CodePudding user response：

i think model_final = Sequential([model_0,model_1]) would do the trick

CodePudding user response：

tf.GradientTape() can take an argument persistent which control whether a persistent gradient tape is created.False by default, which means at most one call can be made to the gradient() method on this object.

for epoch in range(epochs):
    for step, (x_batch, y_batch) in enumerate(train_generator):
        with tf.GradientTape(persistent = True, watch_accessed_variables = True) as tape_0:
            y_pred_0 = model_0(x_batch, training=True)
            y_pred_1 = model_1(y_pred_0, training=True)
            loss_value = loss_fn(y_batch_tensor, y_pred_1)

        grads_1 = tape_0.gradient(y_pred_1, model_1.trainable_weights)
        grads_0 = tape_0.gradient(y_pred_0, model_0.trainable_weights)
        optimizer.apply_gradients(zip(grads_1, model_1.trainable_weights))
        optimizer.apply_gradients(zip(grads_0, model_0.trainable_weights))

can you check if this respond to your needs.