When i train a network with keras, why are the shape of my predictions not accurate?-CodePudding

I'm tensorflow keras to train a model to classify if an image is an a or b. I have 20,000 randomly generated images to use for training (half a, half b). example of a image example of b image

First I import necessary packages

import tensorflow
from matplotlib import pyplot as plt
import cv2
from matplotlib import pyplot as plt
import random
from tensorflow.keras import models 
from tensorflow.keras import layers 
import numpy as np

After that, I load the images from my folder, and process them, turning them into arrays of only 0 and 1, save them together with an appropriate label of either 1 if the image is an a, or 0 if the image is a b. Once I have done that, I put them in one list and shuffle the list to make it random.

a_letters = []
b_letters = []

folder_path_a = 'C:/path/to/folder/'
folder_path_b = 'C:/path/to/folder/'

count = 0
while count < 10000:
    path = folder_path_a   f'a{count}.png'
    img = cv2.imread(path)
    gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    for row_number, row in enumerate(gray_image):
        for collumn_number, collumn in enumerate(row):
            if gray_image[row_number][collumn_number] > 50:
                gray_image[row_number][collumn_number] = 1
            else:
                gray_image[row_number][collumn_number] = 0
    #gray_image = np.expand_dims(gray_image, axis=2)
    image_and_label = [gray_image, 1]
    a_letters.append(image_and_label)
    count = count   1

    
count = 0    
while count < 10000:
    path = folder_path_b   f'b{count}.png'
    img = cv2.imread(path)
    gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    for row_number, row in enumerate(gray_image):
        for collumn_number, collumn in enumerate(row):
            if gray_image[row_number][collumn_number] > 50:
                gray_image[row_number][collumn_number] = 1
            else:
                gray_image[row_number][collumn_number] = 0
    # gray_image = np.expand_dims(gray_image, axis=2)
    image_and_label = [gray_image, 0]
    b_letters.append(image_and_label)    
    count = count   1    

    
unified_list = a_letters   b_letters
random.shuffle(unified_list)

Next, I separate the labels and images into their own lists and split them into training and validation data.

images = []
labels = []

for image, label in unified_list:
    images.append(image)
    labels.append(float(label))

x_train = images[:15000]
y_train = labels[:15000]

x_val = images[15000:]
y_val = labels[15000:]

Then I convert the lists into numpy arrays, and expand the dimensions of the labels (before, i tried to train the model, i got an error saying logits and labels need to be the same dimension, so I have expanded the dimension of the labels to make them the same dimension of the image)

x_train_array = np.asarray(x_train)
y_train_array = np.asarray(y_train)

x_val_array = np.asarray(x_val)
y_val_array = np.asarray(y_val)

y_train_array = np.expand_dims(y_train_array, axis =1)
y_val_array = np.expand_dims(y_val_array, axis = 1)

Next, I build a model and train it:

model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(169,191,)))
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(x_train_array, y_train_array, epochs=10, batch_size=500, validation_data=(x_val_array, y_val_array))

Here's the model summary: model summary

When I try to get predictions with my model by using this code:

predictions = model.predict(x_val_array)

I get a predictions.shape of (5000, 169, 1). It seems that instead of getting one prediction per image, I'm getting 169? I've been working on this for a while and I can't seem to figure it out.

CodePudding user response：

The shape 169 comes from width of your input image.

It is carried over because if you add a dense layer, it connects with only one dimension of the previous tensor.

First thing you could try is to flatten your image:

Flattening

model = models.Sequential()
model.add(layers.Flatten(input_shape = (169,191,)))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
predictions = model.predict(example)
predictions.shape

Model: "sequential_12"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_7 (Flatten)         (None, 32279)             0         
                                                                 
 dense_40 (Dense)            (None, 512)               16527360  
                                                                 
 dense_41 (Dense)            (None, 150)               76950     
                                                                 
 dense_42 (Dense)            (None, 250)               37750     
                                                                 
 dense_43 (Dense)            (None, 1)                 251       
                                                                 
=================================================================
Total params: 16,642,311
Trainable params: 16,642,311
Non-trainable params: 0
_________________________________________________________________
(50, 1)

However, this is not recommended, because the model is too large compared to the information it probably conveys. The model has 18M parameters, which is quite inefficient in doing computations. I would rather use ResNet-18 for a 15M-param model.

Otherwise, you can take advantage of convolution layers. Here is an example:

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(169,191,1)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPool2D(pool_size=(4, 4)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPool2D(pool_size=(4, 4)))
model.add(layers.Flatten())
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
predictions = model.predict(example)
predictions.shape

Model: "sequential_17"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_24 (Conv2D)          (None, 167, 189, 32)      320       
                                                                 
 conv2d_25 (Conv2D)          (None, 165, 187, 32)      9248      
                                                                 
 max_pooling2d_9 (MaxPooling  (None, 41, 46, 32)       0         
 2D)                                                             
                                                                 
 conv2d_26 (Conv2D)          (None, 39, 44, 32)        9248      
                                                                 
 conv2d_27 (Conv2D)          (None, 37, 42, 32)        9248      
                                                                 
 max_pooling2d_10 (MaxPoolin  (None, 9, 10, 32)        0         
 g2D)                                                            
                                                                 
 flatten_12 (Flatten)        (None, 2880)              0         
                                                                 
 dense_56 (Dense)            (None, 150)               432150    
                                                                 
 dense_57 (Dense)            (None, 250)               37750     
                                                                 
 dense_58 (Dense)            (None, 1)                 251       
                                                                 
=================================================================
Total params: 498,215
Trainable params: 498,215
Non-trainable params: 0
_________________________________________________________________
(50, 1)

It is 30 times smaller, but the performance will be much better, since convolutional layers are good at extracting features.