My CNN model kept getting high accuracy/low loss during training and much lower accuracy/higher loss during validation, therefore I started suspecting that it's overfitting.
I have therefore introduced a few dropout layers as well as some image augmentiation. I've also tried monitoring val_loss after each epoch, using ReduceLROnPlateau and EarlyStopping.
Although those measures helped improve validation accuracy a bit, I'm still nowhere close to the desired result and I'm honestly running out of ideas. This is the result I'm obtaining right now:
Epoch 9/30
999/1000 [============================>.] - ETA: 0s - loss: 0.0072 - accuracy: 0.9980
Epoch 9: ReduceLROnPlateau reducing learning rate to 1.500000071246177e-05.
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0072 - accuracy: 0.9980 - val_loss: 2.2994 - val_accuracy: 0.6570 - lr: 1.5000e-04
Epoch 10/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0045 - accuracy: 0.9985 - val_loss: 2.2451 - val_accuracy: 0.6560 - lr: 1.5000e-05
Epoch 11/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0026 - accuracy: 0.9995 - val_loss: 2.6080 - val_accuracy: 0.6540 - lr: 1.5000e-05
Epoch 12/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0018 - accuracy: 1.0000 - val_loss: 2.8192 - val_accuracy: 0.6560 - lr: 1.5000e-05
Epoch 13/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0013 - accuracy: 1.0000 - val_loss: 2.8216 - val_accuracy: 0.6570 - lr: 1.5000e-05
32/32 [==============================] - 1s 23ms/step - loss: 2.8216 - accuracy: 0.6570
Am I wrong to assume that overfitting is still the problem that prevents my model from scoring high on validation and test data?
Or is there something fundamentally wrong with my architecture?
#prevent overfitting, generalize better
data_augmentation = tf.keras.Sequential([
layers.RandomFlip("horizontal_and_vertical"),
layers.RandomRotation(0.2),
layers.RandomZoom((0.2))
])
model = tf.keras.models.Sequential()
model.add(data_augmentation)
#same padding, since edges of the pictures often contain valuable information
model.add(layers.Conv2D(64, (3,3), strides=(1,1), padding='same', activation = 'relu', input_shape=(64,64,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(32, (3,3), strides=(1,1), padding='same', activation = 'relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
#prevent overfitting
model.add(layers.Dropout(0.25))
#4 output classes, softmax since we want to end up with probabilities for each class at the end (have to sum up to 1)
model.add(layers.Dense((4), activation='softmax'))
#not using one hot encoding, therefore sparse categorical entropy
model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.00015), metrics='accuracy')
CodePudding user response:
try using the code below I would add a BatchNormalization layer right after the flatten layer
model.add(layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )
for the dense layer add regularizers
model.add(layers.Dense(128, kernel_regularizer = regularizers.l2(l = 0.016),activity_regularizer=regularizers.l1(0.006),
bias_regularizer=regularizers.l1(0.006) ,activation='relu')
Also I suggest you use an adjustable learning rate using the Keras callback ReduceLROnPlateau. Documentation is here. My recommended code for that is shown below
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.4,
patience=2, verbose=1, mode="auto")
I also recommend us use the Keras callback EarlyStopping. Documentation for that is here. My recommended code for that is below
estop=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=4,
verbose=1,mode="auto",
restore_best_weights=True)
Before you fit the model include code below
callbacks=[rlronp, estop]
in model.fit include callbacks=callbacks
CodePudding user response:
You can try to add regularizer
to all or some of your layers, for example:
model.add(layers.Conv2D(32, (3,3), strides=(1,1), kernel_regularizer='l1_l2', padding='same', activation = 'relu'))
You could try to replace Dropout
with SpatialDropout2D
between the conv
layers. You could also try more image augmentation, maybe GaussianNoise, RandomContrast
, RandomBrightness
Since you have a very high training accuracy, you could also try to simplify your model (less units for example).