Let's assume that we are building a basic CNN that recognizes pictures of cats and dogs (binary classifier).
An example of such CNN can be as follows:
model = Sequential([
Conv2D(32, (3,3), input_shape=...),
Activation('relu'),
MaxPooling2D(pool_size=(2,2),
Conv2D(32, (3,3), input_shape=...),
Activation('relu'),
MaxPooling2D(pool_size=(2,2)
Conv2D(64, (3,3), input_shape=...),
Activation('relu'),
MaxPooling2D(pool_size=(2,2),
Flatten(),
Dense(64),
Activation('relu'),
Dropout(0.5),
Dense(1),
Activation('sigmoid')
])
Let's also assume that we want to have the model split into two parts, or two models, called model_0
and model_1
.
model_0
will handle the input, and model_1
will take model_0
output and take it as an input.
For example, the previous model will become:
model_0 = Sequential([
Conv2D(32, (3,3), input_shape=...),
Activation('relu'),
MaxPooling2D(pool_size=(2,2),
Conv2D(32, (3,3), input_shape=...),
Activation('relu'),
MaxPooling2D(pool_size=(2,2)
Conv2D(64, (3,3), input_shape=...),
Activation('relu'),
MaxPooling2D(pool_size=(2,2)
])
model_1 = Sequential([
Flatten(),
Dense(64),
Activation('relu'),
Dropout(0.5),
Dense(1),
Activation('sigmoid')
])
How do I train the two models as if they were one single model? I have tried to manually set the gradients, but I don't understand how to pass the gradients from model_1
to model_0
:
for epoch in range(epochs):
for step, (x_batch, y_batch) in enumerate(train_generator):
# model 0
with tf.GradientTape() as tape_0:
y_pred_0 = model_0(x_batch, training=True)
# model 1
with tf.GradientTape() as tape_1:
y_pred_1 = model_1(y_pred_0, training=True)
loss_value = loss_fn(y_batch_tensor, y_pred_1)
grads_1 = tape_1.gradient(y_pred_1, model_1.trainable_weights)
grads_0 = tape_0.gradient(y_pred_0, model_0.trainable_weights)
optimizer.apply_gradients(zip(grads_1, model_1.trainable_weights))
optimizer.apply_gradients(zip(grads_0, model_0.trainable_weights))
This method will of course not work, as I am basically just training two models separately and binding them up, which is not what I want to achieve.
EDIT 1:
Please note that I am aware of Sequential([model_0, model_1])
, but this is not what I want to achieve. I want to do the backpropagation step manually.
Any clues?
CodePudding user response:
i think model_final = Sequential([model_0,model_1])
would do the trick
CodePudding user response:
tf.GradientTape()
can take an argument persistent
which control whether a persistent gradient tape is created.False by default, which means at most one call can be made to the gradient() method on this object.
for epoch in range(epochs):
for step, (x_batch, y_batch) in enumerate(train_generator):
with tf.GradientTape(persistent = True, watch_accessed_variables = True) as tape_0:
y_pred_0 = model_0(x_batch, training=True)
y_pred_1 = model_1(y_pred_0, training=True)
loss_value = loss_fn(y_batch_tensor, y_pred_1)
grads_1 = tape_0.gradient(y_pred_1, model_1.trainable_weights)
grads_0 = tape_0.gradient(y_pred_0, model_0.trainable_weights)
optimizer.apply_gradients(zip(grads_1, model_1.trainable_weights))
optimizer.apply_gradients(zip(grads_0, model_0.trainable_weights))
can you check if this respond to your needs.