Is it possible to revert to a past state of a CNN model in between checkpoints?-CodePudding

I'm working with a U-Net GAN. I'm making checkpoints every 5000 steps, and around step 19000 the result was very good, but the system collapsed into junk results immediately after, before it could reach 20000 and checkpoint again. I am assuming the discriminator got outdone by too big of a jump to recover.

Is there any way to revert to this state of the generator model before it went unchallenged for a few hundred steps and became stupid? Like 900-ish steps before the 20k checkpoint.

CodePudding user response：

I don't think it is possible to revert to a certain state during the training that was not a checkpoint. But to prevent this from happening next time, you can use callbacks to save the model you want based on some performance metric for instance accuracy or validation loss. Here is a Python code to do that:

chk = ModelCheckpoint(model_directory_name,
                          monitor="val_loss",
                          verbose=1,
                          save_best_only=True,
                          save_weights_only=False,
                          mode='min',
                          period=1)
callbacks = [chk]

then you can call model.fit this way to include the callbacks:

# Train the Model
model_log = model.fit(x=train_generator,
                      validation_data=test_generator,
                      callbacks=callbacks,
                      ...)

Now when you start training, the model with the best accuracy will always be saved