Keras - Object detection model - Xception vs. VGG-CodePudding

I'm training object detection model using pre-trained models from Keras (VGG16, VGG19, Xception,...) on dataset of more than 4000 images with YOLO coordinates, below is the part responsible data pre-processing for training and validation data as well as model compilation and training.

For VGG16 & VGG19 - I'm resizing images and YOLO coordinates to recommended default image size 224x224, whereas for Xception and InceptionV3, I'm resizing to 299x299.

I'm freezing all layers of the Keras application and adding just 4 top Dense layers, which are being trained on my dateset to leverage the potential of pre-trained models. When I use VGG16 or VGG19, it works well and I ended up with train and validation accuracy over 92%, which is great and the train / val split seems to be balanced. However when I use Xception or InceptionV3 applications, it always stops earlier with accuracy around 10%, which I do not understand.

IMAGE_SIZE = (299, 299)

train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)  # val 20%

val_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)


train_data = train_datagen.flow_from_dataframe(
    dataframe=df_all, 
    directory=save_dir,                                               
    x_col="image_name", 
    y_col=['yolo_x', 'yolo_y', 'yolo_width', 'yolo_length'], 
    class_mode="raw", 
    target_size=IMAGE_SIZE,
    batch_size=32,
    shuffle=True,
    Subset='training'
)

val_data = val_datagen.flow_from_dataframe(
    dataframe=df_all, 
    directory=save_dir,                                               
    x_col="image_name", 
    y_col=['yolo_x', 'yolo_y', 'yolo_width', 'yolo_length'], 
    class_mode="raw", 
    target_size=IMAGE_SIZE,
    batch_size=32,
    shuffle=False,
    Subset='validation'
)

from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint

learning_rate_reduction = ReduceLROnPlateau(monitor='loss', 
                                            patience=5, 
                                            verbose=2, 
                                            factor=0.5,                                            
                                            min_lr=0.000001)

early_stops = EarlyStopping(monitor='loss', 
                            min_delta=0, 
                            patience=10, 
                            verbose=2, 
                            mode='auto')

checkpointer = ModelCheckpoint(filepath = 'cis3115.{epoch:02d}-{accuracy:.6f}.hdf5',
                               verbose=2,
                               save_best_only=True, 
                               save_weights_only = True)


# Select a pre-trained model Xception
#pretrained_model = tf.keras.applications.VGG16(weights='imagenet', include_top=False ,input_shape=[*IMAGE_SIZE, 3])
pretrained_model = tf.keras.applications.Xception(weights='imagenet', include_top=False ,input_shape=[*IMAGE_SIZE, 3])

# Set the following to False so that the pre-trained weights are not changed 
pretrained_model.trainable = False 

model = Sequential()
model.add(pretrained_model)

# Flatten 2D images into 1D data for final layers like traditional neural network
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))

# The final output layer
# Use Sigmoid when predicting YOLO bounding box since that output is between 0 and 1
model.add(Dense(4, activation='sigmoid'))


print ("Pretrained model used:")
pretrained_model.summary()

print ("Final model created:")
model.summary()

# Compile neural network model
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])


# Train the model with the images in the folders
history = model.fit(
        train_data,
        validation_data=val_data,
        batch_size=16,                  # Number of image batches to process per epoch 
        epochs=100,                      # Number of epochs
        callbacks=[learning_rate_reduction, early_stops],
        )

Xception is much more complex pre-trained model, hence should be more precise theoretically, hence I assume that I'm doing something wrong when setting up the CNN.

What is the reason for Xception / Inception models are failing? What should I change?

CodePudding user response：

The issue seems to be in Flatten layer and since it created enormous amount of parameters, it was constantly failing. However, when I replaced the Flatten by GlobalAveragePooling2D, it worked pretty well.

Thus, I replaced this:

model.add(Flatten())

By this:

model.add(GlobalAveragePooling2D())