Is my CNN model still overfitting? If so, how can I combat it? Is there something wrong with my arch-CodePudding

My CNN model kept getting high accuracy/low loss during training and much lower accuracy/higher loss during validation, therefore I started suspecting that it's overfitting.

I have therefore introduced a few dropout layers as well as some image augmentiation. I've also tried monitoring val_loss after each epoch, using ReduceLROnPlateau and EarlyStopping.

Although those measures helped improve validation accuracy a bit, I'm still nowhere close to the desired result and I'm honestly running out of ideas. This is the result I'm obtaining right now:

Epoch 9/30
999/1000 [============================>.] - ETA: 0s - loss: 0.0072 - accuracy: 0.9980
Epoch 9: ReduceLROnPlateau reducing learning rate to 1.500000071246177e-05.
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0072 - accuracy: 0.9980 - val_loss: 2.2994 - val_accuracy: 0.6570 - lr: 1.5000e-04
Epoch 10/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0045 - accuracy: 0.9985 - val_loss: 2.2451 - val_accuracy: 0.6560 - lr: 1.5000e-05
Epoch 11/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0026 - accuracy: 0.9995 - val_loss: 2.6080 - val_accuracy: 0.6540 - lr: 1.5000e-05
Epoch 12/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0018 - accuracy: 1.0000 - val_loss: 2.8192 - val_accuracy: 0.6560 - lr: 1.5000e-05
Epoch 13/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0013 - accuracy: 1.0000 - val_loss: 2.8216 - val_accuracy: 0.6570 - lr: 1.5000e-05
32/32 [==============================] - 1s 23ms/step - loss: 2.8216 - accuracy: 0.6570

Am I wrong to assume that overfitting is still the problem that prevents my model from scoring high on validation and test data?

Or is there something fundamentally wrong with my architecture?

#prevent overfitting, generalize better
 data_augmentation = tf.keras.Sequential([
 layers.RandomFlip("horizontal_and_vertical"),
 layers.RandomRotation(0.2),
 layers.RandomZoom((0.2))
    ])

model = tf.keras.models.Sequential()
 model.add(data_augmentation)
 #same padding, since edges of the pictures often contain valuable information
 model.add(layers.Conv2D(64, (3,3), strides=(1,1), padding='same', activation = 'relu', input_shape=(64,64,3)))
 model.add(layers.MaxPooling2D((2,2)))
 model.add(layers.Dropout(0.25))
 model.add(layers.Conv2D(32, (3,3), strides=(1,1), padding='same', activation = 'relu'))
 model.add(layers.MaxPooling2D((2,2)))
 model.add(layers.Dropout(0.25))
 model.add(layers.Flatten())
 model.add(layers.Dense(128, activation='relu'))
 #prevent overfitting
 model.add(layers.Dropout(0.25))

 #4 output classes, softmax since we want to end up with probabilities for each class at the end (have to sum up to 1)
 model.add(layers.Dense((4), activation='softmax'))
 #not using one hot encoding, therefore sparse categorical entropy
 model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.00015), metrics='accuracy')

CodePudding user response：

try using the code below I would add a BatchNormalization layer right after the flatten layer

model.add(layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )

for the dense layer add regularizers

model.add(layers.Dense(128, kernel_regularizer = regularizers.l2(l = 0.016),activity_regularizer=regularizers.l1(0.006),
                    bias_regularizer=regularizers.l1(0.006) ,activation='relu')

Also I suggest you use an adjustable learning rate using the Keras callback ReduceLROnPlateau. Documentation is here. My recommended code for that is shown below

rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.4,
                                             patience=2, verbose=1, mode="auto")

I also recommend us use the Keras callback EarlyStopping. Documentation for that is here. My recommended code for that is below

estop=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=4,
                                        verbose=1,mode="auto",    
                                        restore_best_weights=True)

Before you fit the model include code below

callbacks=[rlronp, estop]

in model.fit include callbacks=callbacks

CodePudding user response：

You can try to add regularizer to all or some of your layers, for example:

model.add(layers.Conv2D(32, (3,3), strides=(1,1), kernel_regularizer='l1_l2', padding='same', activation = 'relu'))

You could try to replace Dropout with SpatialDropout2D between the conv layers. You could also try more image augmentation, maybe GaussianNoise, RandomContrast, RandomBrightness

Since you have a very high training accuracy, you could also try to simplify your model (less units for example).