Keras Earlystopping not working, too few epochs-CodePudding

I am currently working on a multi-layer 1d-CNN. Recently I shifted my work over to an HPC server to train on both CPU and GPU (NVIDIA).

My code runs beautifully (albeit slowly) on my own laptop with TensorFlow 2.7.3. The HPC server I am using has a newer version of python (3.9.0) and TensorFlow installed.

Onto my problem: The Keras callback function "Earlystopping" no longer works as it should on the server. If I set the patience to 5, it will only run for 5 epochs despite specifying epochs = 50 in model.fit(). It seems as if the function is assuming that the val_loss of the first epoch is the lowest value and then runs from there.

I don't know how to fix this. The function would reach lowest val_loss at 15 epochs and run to 20 epochs on my own laptop. On the server, training time and epochs is not sufficient, with very low accuracy (~40%) on test dataset.

Please help.

CodePudding user response：

So this is the part of the code that I am struggling with:

from tensorflow.keras.losses import BinaryCrossentropy from tensorflow.keras import callbacks

earlystopping = callbacks.EarlyStopping(monitor ='val_loss',mode ="min", patience = 5, restore_best_weights = True)

model.compile(loss = tensorflow.keras.losses.BinaryCrossentropy(), optimizer = 'adam', metrics = ['accuracy'])

model.summary()

history = model.fit(X_train,Y_train,batch_size=16,epochs=50,callbacks = [earlystopping], verbose=2, validation_data=(X_val, Y_val))

I have no idea why it won't run more than 5 epochs. Perhaps I should specify a baseline value?

CodePudding user response：

For some reason, reducing my batch_size from 16 to 8 in model.fit() allowed the EarlyStopping callback to work properly. I don't know why this is though. But it works now.