The second epoch's initial loss is not consistently related to the first epoch's final los-CodePudding

The second loss is not consistently related to the first epoch. After that, every initial loss always stays the same every epoch. And all these parameters stay the same. I have some background in deep learning, but this is my first time implementing my own model so I want to know what's going wrong with my model intuitively. The dataset is the cropped face with two classifications each having 300 pictures. I highly appreciate your help.

import tensorflow as tf
from tensorflow import keras
from IPython.display import Image
import matplotlib.pyplot as plt
from keras.layers import ActivityRegularization
from keras.layers import Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator

image_generator = ImageDataGenerator(
    featurewise_center=False, samplewise_center=False,
    featurewise_std_normalization=False, samplewise_std_normalization=False,
    rotation_range=0, width_shift_range=0.0, height_shift_range=0.0,
    brightness_range=None, shear_range=0.0, zoom_range=0.0, channel_shift_range=0.0,
    horizontal_flip=False, vertical_flip=False, rescale=1./255
)

image = image_generator.flow_from_directory('./util/untitled folder',batch_size=938)

x, y = image.next()
x_train = x[:500]
y_train = y[:500]
x_test = x[500:600]
y_test = y[500:600]

train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(4)
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(4)

plt.imshow(x_train[0])

def convolutional_model(input_shape):

    input_img = tf.keras.Input(shape=input_shape)
    x = tf.keras.layers.Conv2D(64, (7,7), padding='same')(input_img)
    x = tf.keras.layers.BatchNormalization(axis=3)(x)
    x = tf.keras.layers.ReLU()(x)
    x = tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=1, padding='same')(x)
    x = Dropout(0.5)(x)
    x = tf.keras.layers.Conv2D(128, (3, 3), padding='same', strides=1)(x)
    x = tf.keras.layers.ReLU()(x)
    x = tf.keras.layers.MaxPool2D(pool_size=(2, 2), padding='same', strides=4)(x)
    x = tf.keras.layers.Flatten()(x)
    x = ActivityRegularization(0.1,0.2)(x)
    outputs = tf.keras.layers.Dense(2, activation='softmax')(x)


    model = tf.keras.Model(inputs=input_img, outputs=outputs)
    return model


conv_model = convolutional_model((256, 256, 3))
conv_model.compile(loss=keras.losses.categorical_crossentropy,
                   optimizer=keras.optimizers.SGD(lr=1),
                   metrics=['accuracy'])
conv_model.summary()

conv_model.fit(train_dataset,epochs=100, validation_data=test_dataset)


Epoch 1/100
    2021-12-23 15:06:22.165763: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
    2021-12-23 15:06:22.172255: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
    125/125 [==============================] - ETA: 0s - loss: 804.6805 - accuracy: 0.48602021-12-23 15:06:50.936870: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
    125/125 [==============================] - 35s 275ms/step - loss: 804.6805 - accuracy: 0.4860 - val_loss: 0.7197 - val_accuracy: 0.4980
    Epoch 2/100
    125/125 [==============================] - 34s 270ms/step - loss: 0.7360 - accuracy: 0.4820 - val_loss: 0.7197 - val_accuracy: 0.4980
    Epoch 3/100
    125/125 [==============================] - 34s 276ms/step - loss: 0.7360 - accuracy: 0.4820 - val_loss: 0.7197 - val_accuracy: 0.4980

CodePudding user response：

As you have a constant loss accuracy, it is highly likely that your network does not learn anything (since you have two classes, it always predicts one of them).

The activation function, loss function and number of neurons on the last layer are correct.

The problem is not related to they way you load the images, but to the learning rate which is 1. At such a high learning rate, it is impossible for the network to be able to learn anything.

You should start with a much smaller learning rate, for example 0.0001 or 0.00001, and then try to debug the data-loading process if you still have poor performance.

CodePudding user response：

I am quite certain that it has something to do with how you load the data, and more specifically the x, y = image.next() part. If you are able to split the data from ./util/untitled folder to separate folders having training and validation data respectively, you could use the same kind on pipeline as in the examples section on Tensorflow page:

train_datagen = ImageDataGenerator(
    featurewise_center=False, 
    samplewise_center=False,
    featurewise_std_normalization=False, 
    samplewise_std_normalization=False,
    rotation_range=0, 
    width_shift_range=0.0, 
    height_shift_range=0.0,
    brightness_range=None, 
    shear_range=0.0, 
    zoom_range=0.0, 
    channel_shift_range=0.0,
    horizontal_flip=False, 
    vertical_flip=False, 
    rescale=1./255)
test_datagen = ImageDataGenerator(featurewise_center=False, 
    samplewise_center=False,
    featurewise_std_normalization=False, 
    samplewise_std_normalization=False,
    rotation_range=0, 
    width_shift_range=0.0, 
    height_shift_range=0.0,
    brightness_range=None, 
    shear_range=0.0, 
    zoom_range=0.0, 
    channel_shift_range=0.0,
    horizontal_flip=False, 
    vertical_flip=False, 
    rescale=1./255)
train_generator = train_datagen.flow_from_directory(
    'data/train',
    target_size=(256, 256),
    batch_size=4)
validation_generator = test_datagen.flow_from_directory(
    'data/validation',
    target_size=(256, 256),
    batch_size=4)
model.fit(
    train_generator,
    epochs=100,
    validation_data=validation_generator)