I have a dataset containing 3000 images in train and 6000 images in test. It's 320x320 rgb png files. I thought that I can load this entire dataset into memory (since it's just 100mb), but then I try to do that I'm getting "[Errno 24] Too many open files: ..." error. Code of loading looks like that:
train_images = []
for index, row in dataset_p_train.iterrows():
path = data_path / row.img_path
train_images.append(Image.open(path))
I know that I'm opening 9000 files and not closing them which isn't a good practice, but unfortunately for my classificator I heavily rely on PIL img.getcolors()
method, so I really want to store that dataset in memory as list of PIL images and not as a numpy array of 3000x320x320x3 uint8 to avoid casting them into PIL image each time I need colors of image.
So, what should I do? Somehow increase limit of opened files? Or there is a way to make PIL images reside entirely in memory without being "opened" from disk?
CodePudding user response:
Image.open
is lazy. It will not load the data until you try to do something with it.
You can call the image's load
method to explicitly load the file contents. This will also close the file, unless the image has multiple frames (for example, an animated GIF).
See File Handling in Pillow for more details.