Memory issue with cv.imread-CodePudding

I trying to read a large number (54K) of 512x512x3 .png images into an array to create a dataset afterwards and feed to a Keras model. I am using the code below, however I am getting the cv2.OutofMemory error (at around image 50K...) pointing to the fourth line of my code. I have been reading a bit about it, and: I am using the 64bit version, and the images can not be resized as it is a fixed input representation. Is there anything that can be done from a memory management side of things to make it work?

''' #Images (512x512x3) X_data = [] files = glob.glob ('C:\Users\77901677\Projects\images1\*.png') for myFile in files: image = cv2.imread (myFile) X_data.append (image)

dataset_image = np.array(X_data)

# Annontations (multilabel) 512x512x2
Y_data = []
files = glob.glob ('C:\\Users\\77901677\\Projects\\annotations1\\*.png')
for myFile in files:
    mask = cv2.imread (myFile)
    # Gets rid of first channel which is empty
    mask = mask[:,:,1:]
    Y_data.append (mask)
dataset_mask = np.array(Y_data)

'''

Any ideas or advices are welcome

CodePudding user response：

You can reduce the memory by cutting one of your variables, because you have 2x the array at the moment.

You could use yield for this, thus creating a generator, which will only load your file one at a time, instead of storing it all in an auxiliary variable.

def myGenerator():
    files = glob.glob ('C:\\Users\\77901677\\Projects\\annotations1\\*.png')
    for myFile in files:
        mask = cv2.imread (myFile)
        # Gets rid of first channel which is empty
        yield mask[:,:,1:]

# initialise your numpy array here
yData = np.zeros(NxHxWxC)

# initialise the generator
mygenerator = myGenerator() # create a generator
for I, data in enumerate(myGenerator):
    yData[I,::] = data # load the data

But, this is not optimal for you. If you plan to train a model in the next step, you will have memory issues for sure. In keras, you can additionally implement a Keras Sequence Generator, which will load your files in batches (similarly to this yield generator) to your model in the training stage. I recommend this article here, which demonstrates an easy implementation of it, it's what I use for my keras/tf model pipelines.

It's good practice to use generators when feeding our models large amounts of data.