I'm trying to create an Image Classifier on a dataset with 40'000 images, in order to let Autokeras train the most appropriate model for me afterwards. Now the problem is, that every time I load all the images and get their labels but when I run the normalization Google Colab, there is a RAM overflow (although having a Pro account). Subsequently my code:
# Import TensorFlow
%tensorflow_version 2.x
import tensorflow as tf
import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import normalize, to_categorical
!pip install autokeras
import autokeras as ak
import glob
import numpy as np
images = glob.glob(path '/*.png')
import random
data = []
labels = []
for i in images:
image=tf.keras.preprocessing.image.load_img(i, color_mode='rgb')
image=np.array(image, dtype ='float32')
image=cv2.resize(image, (180, 180))
image/=255.0
data.append(image)
label=os.path.basename(str(i.replace('.png', '')))
label=label.split()[0]
labels.append(label)
data = np.array(data)
labels = np.array(labels)
print(labels)
Up until here everything works like a charm, but then I face the problem which creates the overflow:
# normalize feature and encode label
X = data
y = np.zeros(labels.shape)
indices = np.unique(labels)
for i in range(labels.shape[0]):
y[i] = np.where(labels[i] == indices)[0]
y = to_categorical(y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42) # this line seems to be the problem where Colab has a RAM overflow
clf= ak.ImageClassifier(overwrite=True, max_trials=20)
clf.fit(X_train, y_train, epochs=10) # and also this line seems to be the problem where Colab has a RAM overflow.
Does anyone know what the problem might be? A hint in any direction would be highly appreciated, as this makes me going crazy!
CodePudding user response:
I highly recommend you tf.data.Dataset
for creating the dataset:
- Do all processes (like resize and normalize) that you want on all images with
dataset.map
. - Instead of using
train_test_split
, Usedataset.take
,datast.skip
for splitting dataset.
Code for generating random images and label:
# !pip install autokeras
import tensorflow as tf
import autokeras as ak
import numpy as np
data = np.random.randint(0, 255, (45_000,32,32,3))
label = np.random.randint(0, 10, 45_000)
label = tf.keras.utils.to_categorical(label)
Convert data & label to tf.data.Dataset
and process on them: (only 55 ms for 45_000
, benchmark on colab)
dataset = tf.data.Dataset.from_tensor_slices((data, label))
def resize_normalize_preprocess(image, label):
image = tf.image.resize(image, (16, 16))
image = image / 255.0
return image, label
# %%timeit
dataset = dataset.map(resize_normalize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
# 1 loop, best of 5: 54.9 ms per loop
- Split dataset to
80% for train
and20% for test
- Train and evaluate
AutoKeras.ImageClassifier
dataet_size = len(dataset)
train_size = int(0.8 * dataet_size)
test_size = int(0.2 * len(dataset))
dataset = dataset.shuffle(32)
train_dataset = dataset.take(train_size)
test_dataset = dataset.skip(train_size)
print(f'Size dataset : {len(dataset)}')
print(f'Size train_dataset : {len(train_dataset)}')
print(f'Size test_dataset : {len(test_dataset)}')
clf = ak.ImageClassifier(overwrite=True, max_trials=1)
clf.fit(train_dataset, epochs=1)
print(clf.evaluate(test_dataset))
Output:
Size dataset : 45000
Size train_dataset : 36000
Size test_dataset : 9000
Search: Running Trial #1
Value |Best Value So Far |Hyperparameter
vanilla |? |image_block_1/block_type
True |? |image_block_1/normalize
False |? |image_block_1/augment
3 |? |image_block_1/conv_block_1/kernel_size
1 |? |image_block_1/conv_block_1/num_blocks
2 |? |image_block_1/conv_block_1/num_layers
True |? |image_block_1/conv_block_1/max_pooling
False |? |image_block_1/conv_block_1/separable
0.25 |? |image_block_1/conv_block_1/dropout
32 |? |image_block_1/conv_block_1/filters_0_0
64 |? |image_block_1/conv_block_1/filters_0_1
flatten |? |classification_head_1/spatial_reduction_1/reduction_type
0.5 |? |classification_head_1/dropout
adam |? |optimizer
0.001 |? |learning_rate
Result of searching and finding the best parameter and training:
Trial 1 Complete [00h 01m 16s]
val_loss: 2.3030436038970947
Best val_loss So Far: 2.3030436038970947
Total elapsed time: 00h 01m 16s
INFO:tensorflow:Oracle triggered exit
1125/1125 [==============================] - 68s 60ms/step - loss: 2.3072 - accuracy: 0.0979
INFO:tensorflow:Assets written to: ./image_classifier/best_model/assets
282/282 [==============================] - 26s 57ms/step - loss: 2.3025 - accuracy: 0.0970
[2.302501916885376, 0.09700000286102295]