I'm tensorflow keras to train a model to classify if an image is an a or b. I have 20,000 randomly generated images to use for training (half a, half b). example of a image example of b image
First I import necessary packages
import tensorflow
from matplotlib import pyplot as plt
import cv2
from matplotlib import pyplot as plt
import random
from tensorflow.keras import models
from tensorflow.keras import layers
import numpy as np
After that, I load the images from my folder, and process them, turning them into arrays of only 0 and 1, save them together with an appropriate label of either 1 if the image is an a, or 0 if the image is a b. Once I have done that, I put them in one list and shuffle the list to make it random.
a_letters = []
b_letters = []
folder_path_a = 'C:/path/to/folder/'
folder_path_b = 'C:/path/to/folder/'
count = 0
while count < 10000:
path = folder_path_a f'a{count}.png'
img = cv2.imread(path)
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
for row_number, row in enumerate(gray_image):
for collumn_number, collumn in enumerate(row):
if gray_image[row_number][collumn_number] > 50:
gray_image[row_number][collumn_number] = 1
else:
gray_image[row_number][collumn_number] = 0
#gray_image = np.expand_dims(gray_image, axis=2)
image_and_label = [gray_image, 1]
a_letters.append(image_and_label)
count = count 1
count = 0
while count < 10000:
path = folder_path_b f'b{count}.png'
img = cv2.imread(path)
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
for row_number, row in enumerate(gray_image):
for collumn_number, collumn in enumerate(row):
if gray_image[row_number][collumn_number] > 50:
gray_image[row_number][collumn_number] = 1
else:
gray_image[row_number][collumn_number] = 0
# gray_image = np.expand_dims(gray_image, axis=2)
image_and_label = [gray_image, 0]
b_letters.append(image_and_label)
count = count 1
unified_list = a_letters b_letters
random.shuffle(unified_list)
Next, I separate the labels and images into their own lists and split them into training and validation data.
images = []
labels = []
for image, label in unified_list:
images.append(image)
labels.append(float(label))
x_train = images[:15000]
y_train = labels[:15000]
x_val = images[15000:]
y_val = labels[15000:]
Then I convert the lists into numpy arrays, and expand the dimensions of the labels (before, i tried to train the model, i got an error saying logits and labels need to be the same dimension, so I have expanded the dimension of the labels to make them the same dimension of the image)
x_train_array = np.asarray(x_train)
y_train_array = np.asarray(y_train)
x_val_array = np.asarray(x_val)
y_val_array = np.asarray(y_val)
y_train_array = np.expand_dims(y_train_array, axis =1)
y_val_array = np.expand_dims(y_val_array, axis = 1)
Next, I build a model and train it:
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(169,191,)))
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train_array, y_train_array, epochs=10, batch_size=500, validation_data=(x_val_array, y_val_array))
Here's the model summary: model summary
When I try to get predictions with my model by using this code:
predictions = model.predict(x_val_array)
I get a predictions.shape of (5000, 169, 1). It seems that instead of getting one prediction per image, I'm getting 169? I've been working on this for a while and I can't seem to figure it out.
CodePudding user response:
The shape 169 comes from width of your input image.
It is carried over because if you add a dense layer, it connects with only one dimension of the previous tensor.
First thing you could try is to flatten your image:
Flattening
model = models.Sequential()
model.add(layers.Flatten(input_shape = (169,191,)))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
predictions = model.predict(example)
predictions.shape
Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_7 (Flatten) (None, 32279) 0
dense_40 (Dense) (None, 512) 16527360
dense_41 (Dense) (None, 150) 76950
dense_42 (Dense) (None, 250) 37750
dense_43 (Dense) (None, 1) 251
=================================================================
Total params: 16,642,311
Trainable params: 16,642,311
Non-trainable params: 0
_________________________________________________________________
(50, 1)
However, this is not recommended, because the model is too large compared to the information it probably conveys. The model has 18M parameters, which is quite inefficient in doing computations. I would rather use ResNet-18 for a 15M-param model.
Otherwise, you can take advantage of convolution layers. Here is an example:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(169,191,1)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPool2D(pool_size=(4, 4)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPool2D(pool_size=(4, 4)))
model.add(layers.Flatten())
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
predictions = model.predict(example)
predictions.shape
Model: "sequential_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_24 (Conv2D) (None, 167, 189, 32) 320
conv2d_25 (Conv2D) (None, 165, 187, 32) 9248
max_pooling2d_9 (MaxPooling (None, 41, 46, 32) 0
2D)
conv2d_26 (Conv2D) (None, 39, 44, 32) 9248
conv2d_27 (Conv2D) (None, 37, 42, 32) 9248
max_pooling2d_10 (MaxPoolin (None, 9, 10, 32) 0
g2D)
flatten_12 (Flatten) (None, 2880) 0
dense_56 (Dense) (None, 150) 432150
dense_57 (Dense) (None, 250) 37750
dense_58 (Dense) (None, 1) 251
=================================================================
Total params: 498,215
Trainable params: 498,215
Non-trainable params: 0
_________________________________________________________________
(50, 1)
It is 30 times smaller, but the performance will be much better, since convolutional layers are good at extracting features.