I'm tensorflow keras to train a model to classify if an image is an a or b. I have 20,000 randomly generated images to use for training (half a, half b). example of a image example of b image

First I import necessary packages

import tensorflow
from matplotlib import pyplot as plt
import cv2
import random
from tensorflow.keras import models 
from tensorflow.keras import layers 
import numpy as np

After that, I load the images from my folder, and process them, turning them into arrays of only 0 and 1, save them together with an appropriate label of either 1 if the image is an a, or 0 if the image is a b. Once I have done that, I put them in one list and shuffle the list to make it random.

a_letters = []
b_letters = []

folder_path_a = 'C:/path/to/folder/'
folder_path_b = 'C:/path/to/folder/'

count = 0
while count < 10000:
    path = folder_path_a   f'a{count}.png'
    img = cv2.imread(path)
    gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    for row_number, row in enumerate(gray_image):
        for collumn_number, collumn in enumerate(row):
            if gray_image[row_number][collumn_number] > 50:
                gray_image[row_number][collumn_number] = 1
                gray_image[row_number][collumn_number] = 0
    #gray_image = np.expand_dims(gray_image, axis=2)
    image_and_label = [gray_image, 1]
    count = count   1

count = 0    
while count < 10000:
    path = folder_path_b   f'b{count}.png'
    img = cv2.imread(path)
    gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    for row_number, row in enumerate(gray_image):
        for collumn_number, collumn in enumerate(row):
            if gray_image[row_number][collumn_number] > 50:
                gray_image[row_number][collumn_number] = 1
                gray_image[row_number][collumn_number] = 0
    # gray_image = np.expand_dims(gray_image, axis=2)
    image_and_label = [gray_image, 0]
    count = count   1    

unified_list = a_letters   b_letters

Next, I separate the labels and images into their own lists and split them into training and validation data.

images = []
labels = []

for image, label in unified_list:

x_train = images[:15000]
y_train = labels[:15000]

x_val = images[15000:]
y_val = labels[15000:]

Then I convert the lists into numpy arrays, and expand the dimensions of the labels (before, i tried to train the model, i got an error saying logits and labels need to be the same dimension, so I have expanded the dimension of the labels to make them the same dimension of the image)

x_train_array = np.asarray(x_train)
y_train_array = np.asarray(y_train)

x_val_array = np.asarray(x_val)
y_val_array = np.asarray(y_val)

y_train_array = np.expand_dims(y_train_array, axis =1)
y_val_array = np.expand_dims(y_val_array, axis = 1)

Next, I build a model and train it:

model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(169,191,)))
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(x_train_array, y_train_array, epochs=10, batch_size=500, validation_data=(x_val_array, y_val_array))

Here's the model summary: model summary

When I try to get predictions with my model by using this code:

predictions = model.predict(x_val_array)

I get a predictions.shape of (5000, 169, 1). It seems that instead of getting one prediction per image, I'm getting 169? I've been working on this for a while and I can't seem to figure it out.

The shape 169 comes from width of your input image.

It is carried over because if you add a dense layer, it connects with only one dimension of the previous tensor.

First thing you could try is to flatten your image:


model = models.Sequential()
model.add(layers.Flatten(input_shape = (169,191,)))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
predictions = model.predict(example)
Model: "sequential_12"
 Layer (type)                Output Shape              Param #   
 flatten_7 (Flatten)         (None, 32279)             0         
 dense_40 (Dense)            (None, 512)               16527360  
 dense_41 (Dense)            (None, 150)               76950     
 dense_42 (Dense)            (None, 250)               37750     
 dense_43 (Dense)            (None, 1)                 251       
Total params: 16,642,311
Trainable params: 16,642,311
Non-trainable params: 0
(50, 1)

However, this is not recommended, because the model is too large compared to the information it probably conveys. The model has 18M parameters, which is quite inefficient in doing computations. I would rather use ResNet-18 for a 15M-param model.

Otherwise, you can take advantage of convolution layers. Here is an example:

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(169,191,1)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPool2D(pool_size=(4, 4)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPool2D(pool_size=(4, 4)))
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
predictions = model.predict(example)
Model: "sequential_17"
 Layer (type)                Output Shape              Param #   
 conv2d_24 (Conv2D)          (None, 167, 189, 32)      320       
 conv2d_25 (Conv2D)          (None, 165, 187, 32)      9248      
 max_pooling2d_9 (MaxPooling  (None, 41, 46, 32)       0         
 conv2d_26 (Conv2D)          (None, 39, 44, 32)        9248      
 conv2d_27 (Conv2D)          (None, 37, 42, 32)        9248      
 max_pooling2d_10 (MaxPoolin  (None, 9, 10, 32)        0         
 flatten_12 (Flatten)        (None, 2880)              0         
 dense_56 (Dense)            (None, 150)               432150    
 dense_57 (Dense)            (None, 250)               37750     
 dense_58 (Dense)            (None, 1)                 251       
Total params: 498,215
Trainable params: 498,215
Non-trainable params: 0
(50, 1)

It is 30 times smaller, but the performance will be much better, since convolutional layers are good at extracting features.

