I am creating a dataset of food to train my model, i use glob module to read images in a directory and matplotlib to read each image data, however sometimes this makes my laptop freezes which make me force it to shut down!
the following code (sometimes) causes this
data_dir ="/home/moumenshobaky/tensorflow_files/virtualenv/archive/training/Egg/*"
train_x = []
train_x.clear()
for filename in glob.glob(data_dir):
train_x.append(mpimg.imread(filename).tolist())
data_dir ="/home/moumenshobaky/tensorflow_files/virtualenv/archive/training/Meat/*"
for filename2 in glob.glob(data_dir):
train_x.append(mpimg.imread(filename2).tolist())
and sometimes the line code causes the same thing also
train_x_float = pixel/255.0 for pixel in train_x
CodePudding user response:
You are creating a list in which you store all the images on your drive.
Consider to use generators (as to prepare the data for your training), freezing is normal as you use a lot of RAM memory; otherwise just load a part of your dataset if you still want to make some analysis. Some small overhead is also done by transforming the numpy array (loaded image) always to a list, you eliminate the transformation.
You could use this snippet to check for memory usage:
from sys import getsizeof
size_of_list_in_bytes = getsizeof(train_x)
size_of_list_in_mb = size_of_list_in_bytes / 1000000 (6 zeros).
As for the generator usages, should you use TensorFlow or Keras, you could have a look at Sequence
: https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence or ImageDataGenerator
. It really depends of the task, I recommend the former as it has more flexibility. Last but not least, tf.data.Dataset()
are becoming more and more widespread. Each of these provide built-in generators.