Home > OS >  Memory issue with cv.imread
Memory issue with cv.imread

Time:10-26

I trying to read a large number (54K) of 512x512x3 .png images into an array to create a dataset afterwards and feed to a Keras model. I am using the code below, however I am getting the cv2.OutofMemory error (at around image 50K...) pointing to the fourth line of my code. I have been reading a bit about it, and: I am using the 64bit version, and the images can not be resized as it is a fixed input representation. Is there anything that can be done from a memory management side of things to make it work?

''' #Images (512x512x3) X_data = [] files = glob.glob ('C:\Users\77901677\Projects\images1\*.png') for myFile in files: image = cv2.imread (myFile) X_data.append (image)

dataset_image = np.array(X_data)

# Annontations (multilabel) 512x512x2
Y_data = []
files = glob.glob ('C:\\Users\\77901677\\Projects\\annotations1\\*.png')
for myFile in files:
    mask = cv2.imread (myFile)
    # Gets rid of first channel which is empty
    mask = mask[:,:,1:]
    Y_data.append (mask)
dataset_mask = np.array(Y_data)

'''

Any ideas or advices are welcome

CodePudding user response:

You can reduce the memory by cutting one of your variables, because you have 2x the array at the moment.

You could use yield for this, thus creating a generator, which will only load your file one at a time, instead of storing it all in an auxiliary variable.

def myGenerator():
    files = glob.glob ('C:\\Users\\77901677\\Projects\\annotations1\\*.png')
    for myFile in files:
        mask = cv2.imread (myFile)
        # Gets rid of first channel which is empty
        yield mask[:,:,1:]

# initialise your numpy array here
yData = np.zeros(NxHxWxC)

# initialise the generator
mygenerator = myGenerator() # create a generator
for I, data in enumerate(myGenerator):
    yData[I,::] = data # load the data

But, this is not optimal for you. If you plan to train a model in the next step, you will have memory issues for sure. In keras, you can additionally implement a Keras Sequence Generator, which will load your files in batches (similarly to this yield generator) to your model in the training stage. I recommend this article here, which demonstrates an easy implementation of it, it's what I use for my keras/tf model pipelines.

It's good practice to use generators when feeding our models large amounts of data.

  • Related