Transfer Learning Neural Network is not learning-CodePudding

First of all, I know that there is a similar thread here: https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn

But unfortunately, it does not help. I probably have a bug inside my code which I cannot find. What I am trying to do is to classify some WAV files. But the model does not learn.

At first, I am collecting the files and saving them in an array.

Second, create new directories, one for train data and one for val data. Next, I am reading the WAV files, creating spectrograms, and saving them all to the train directory. Afterward, I am moving 20% of the data from the train directory to the val directory.

Note: While creating the spectrograms I am checking the length of the WAV. If it is too short (less than 2 sec), I am doubling it. Out of this spectrogram, I am cutting a random chunk and saving only this. As a result, all images do have the same height and width.

Then as the next step, I am loading the train and val images. And here I am also doing the normalization.

IMG_WIDTH=300
IMG_HEIGHT=300
IMG_DIM = (IMG_WIDTH, IMG_HEIGHT, 3)

train_files = glob.glob(DBMEL_PATH   "*",recursive=True)
train_imgs = [img_to_array(load_img(img, target_size=IMG_DIM)) for img in train_files]
train_imgs = np.array(train_imgs) / 255                  # normalizing Data
train_labels = [fn.split('\\')[-1].split('.')[1].strip() for fn in train_files]

validation_files = glob.glob(DBMEL_VAL_PATH   "*",recursive=True)
validation_imgs = [img_to_array(load_img(img, target_size=IMG_DIM)) for img in validation_files]
validation_imgs = np.array(validation_imgs) / 255         # normalizing Data
validation_labels = [fn.split('\\')[-1].split('.')[1].strip() for fn in validation_files]

I have checked the variables and printing them. I guess this is working quite well. The arrays contain 80% and respectively 20% of the total data.

#Train dataset shape: (3756, 300, 300, 3) 
#Validation dataset shape: (939, 300, 300, 3)

Next, I have also implemented a One-Hot-Encoder. So far so good. In the next step I create empty DataGenerators, so without any data augmentation. When calling the DataGenerators, one time for train-data and one time for val-data, I'll pass the arrays for images (train_imgs, validation_imgs) and the one-hot-encoded-labels (train_labels_enc, validation_labels_enc).

Okay. Here now comes the tricky part. First, create/load a pre-trained network

from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.models import Model
import tensorflow.keras

input_shape=(IMG_HEIGHT,IMG_WIDTH,3)
restnet = ResNet50(include_top=False, weights='imagenet', input_shape=(IMG_HEIGHT,IMG_WIDTH,3))

output = restnet.layers[-1].output
output = tensorflow.keras.layers.Flatten()(output)

restnet = Model(restnet.input, output)

for layer in restnet.layers:
    layer.trainable = False

And now finally creating the model itself. While creating the model I am using the pre-trained network for transfer learning. I guess somewhere there must be a problem.

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, InputLayer
from tensorflow.keras.models import Sequential
from tensorflow.keras import optimizers


model = Sequential()
model.add(restnet)   #  <-- transfer learning
model.add(Dense(512, activation='relu', input_dim=input_shape))# 512 (num_classes)
model.add(Dropout(0.3))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(7, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.summary()

And the models run with this

history = model.fit_generator(train_generator, 
                              steps_per_epoch=100, 
                              epochs=100,
                              validation_data=val_generator, 
                              validation_steps=10, 
                              verbose=1
                             )

But even after 50 epochs the accuracy stalls at around 0.15

Epoch 1/100
100/100 [==============================] - 711s 7s/step - loss: 10.6419 - accuracy: 0.1530 - val_loss: 1.9416 - val_accuracy: 0.1467
Epoch 2/100
100/100 [==============================] - 733s 7s/step - loss: 1.9595 - accuracy: 0.1550 - val_loss: 1.9372 - val_accuracy: 0.1267
Epoch 3/100
100/100 [==============================] - 731s 7s/step - loss: 1.9940 - accuracy: 0.1444 - val_loss: 1.9388 - val_accuracy: 0.1400
Epoch 4/100
100/100 [==============================] - 735s 7s/step - loss: 1.9416 - accuracy: 0.1535 - val_loss: 1.9380 - val_accuracy: 0.1733
Epoch 5/100
100/100 [==============================] - 737s 7s/step - loss: 1.9394 - accuracy: 0.1656 - val_loss: 1.9345 - val_accuracy: 0.1533
Epoch 6/100
100/100 [==============================] - 741s 7s/step - loss: 1.9364 - accuracy: 0.1667 - val_loss: 1.9286 - val_accuracy: 0.1767
Epoch 7/100
100/100 [==============================] - 740s 7s/step - loss: 1.9389 - accuracy: 0.1523 - val_loss: 1.9305 - val_accuracy: 0.1400
Epoch 8/100
100/100 [==============================] - 737s 7s/step - loss: 1.9394 - accuracy: 0.1623 - val_loss: 1.9441 - val_accuracy: 0.1667
Epoch 9/100
100/100 [==============================] - 735s 7s/step - loss: 1.9391 - accuracy: 0.1582 - val_loss: 1.9458 - val_accuracy: 0.1333
Epoch 10/100
100/100 [==============================] - 734s 7s/step - loss: 1.9381 - accuracy: 0.1602 - val_loss: 1.9372 - val_accuracy: 0.1700
Epoch 11/100
100/100 [==============================] - 739s 7s/step - loss: 1.9392 - accuracy: 0.1623 - val_loss: 1.9302 - val_accuracy: 0.2167
Epoch 12/100
100/100 [==============================] - 741s 7s/step - loss: 1.9368 - accuracy: 0.1627 - val_loss: 1.9326 - val_accuracy: 0.1467
Epoch 13/100
100/100 [==============================] - 740s 7s/step - loss: 1.9381 - accuracy: 0.1513 - val_loss: 1.9312 - val_accuracy: 0.1733
Epoch 14/100
100/100 [==============================] - 736s 7s/step - loss: 1.9396 - accuracy: 0.1542 - val_loss: 1.9407 - val_accuracy: 0.1367
Epoch 15/100
100/100 [==============================] - 741s 7s/step - loss: 1.9393 - accuracy: 0.1597 - val_loss: 1.9336 - val_accuracy: 0.1333
Epoch 16/100
100/100 [==============================] - 737s 7s/step - loss: 1.9375 - accuracy: 0.1659 - val_loss: 1.9354 - val_accuracy: 0.1267
Epoch 17/100
100/100 [==============================] - 741s 7s/step - loss: 1.9422 - accuracy: 0.1487 - val_loss: 1.9307 - val_accuracy: 0.1567
Epoch 18/100
100/100 [==============================] - 738s 7s/step - loss: 1.9399 - accuracy: 0.1680 - val_loss: 1.9408 - val_accuracy: 0.1567
Epoch 19/100
100/100 [==============================] - 743s 7s/step - loss: 1.9405 - accuracy: 0.1610 - val_loss: 1.9335 - val_accuracy: 0.1533
Epoch 20/100
100/100 [==============================] - 738s 7s/step - loss: 1.9410 - accuracy: 0.1575 - val_loss: 1.9331 - val_accuracy: 0.1533
Epoch 21/100
100/100 [==============================] - 746s 7s/step - loss: 1.9395 - accuracy: 0.1639 - val_loss: 1.9344 - val_accuracy: 0.1733
Epoch 22/100
100/100 [==============================] - 746s 7s/step - loss: 1.9393 - accuracy: 0.1585 - val_loss: 1.9354 - val_accuracy: 0.1667
Epoch 23/100
100/100 [==============================] - 746s 7s/step - loss: 1.9398 - accuracy: 0.1599 - val_loss: 1.9352 - val_accuracy: 0.1500
Epoch 24/100
100/100 [==============================] - 746s 7s/step - loss: 1.9392 - accuracy: 0.1585 - val_loss: 1.9449 - val_accuracy: 0.1667
Epoch 25/100
100/100 [==============================] - 746s 7s/step - loss: 1.9399 - accuracy: 0.1495 - val_loss: 1.9352 - val_accuracy: 0.1600

Can anyone please help to find the problem?

CodePudding user response：

I solved the problem on my own.

I exchanged this

model = Sequential()
model.add(restnet)   #  <-- transfer learning
model.add(Dense(512, activation='relu', input_dim=input_shape))# 512 (num_classes)
model.add(Dropout(0.3))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(7, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.summary()

with this:

base_model = tf.keras.applications.MobileNetV2(input_shape = (224, 224, 3), include_top = False, weights = "imagenet")

model = Sequential()
model.add(base_model)
model.add(tf.keras.layers.GlobalAveragePooling2D())
model.add(Dropout(0.2))
model.add(Dense(number_classes, activation="softmax"))
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.00001),
              loss="categorical_crossentropy",
              metrics=['accuracy'])
model.summary()

And I found out one more thing. In contrary to some tutorials, using data augmentation is not useful when working with spectrograms. Without data augmentation I got 0.99 on train-accuracy and 0.72 on val-accuracy. But with data augmentation I got only 0.75 on train-accuracy and 0.16 on val-accuracy.