Home > Blockchain >  Best practice of loading a huge image dataset for ML
Best practice of loading a huge image dataset for ML

Time:08-19

I'm playing around with a image dataset in kanggle (https://www.kaggle.com/competitions/paddy-disease-classification/data). In this dataset, there are about 10000 images with 480*640 resolution.
When I try to load this dataset by following code,

for (label, file) in dataset_file_img(dataset_path)
    image = load_img_into_tensor(file)
    data.append(image/255)
    data_label.append(label)

it consume about 20GB of RAM.

What is the best practice of loading a dataset like this?
Any help will/would be appreciated!

CodePudding user response:

Try the following from keras:

  1. ImageDataGenerator here

  2. image_dataset_from_directory function here

CodePudding user response:

If you have enough gpu computing power, ImageDataGenerator will probably give you a bottleneck. Like suggested by Shubham, try to use tf.data, which is the best option as far as I know.

  • Related