Home > OS >  Delete numpy arrays from memory after loading into tensorflow
Delete numpy arrays from memory after loading into tensorflow

Time:02-05

I have 4 numpy arrays x_train, x_test, y_train, y_test which consume about 5GB of memory. I have loaded these into tensorflow with the following code.

train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))

train_dataset and test_dataset together use about 8GB of memory. The problem is that I am running out of memory and I no longer have any use of the numpy arrays. How can I free those variables from memory?

I tried del <variable_name> in python, but it seems it deletes only the pointer and does not free the memory.

Setting the variables to 0 also doesn't work.

CodePudding user response:

I suggest you:

1-> Maybe, it is possible that tf.data.Dataset.from_tensor_slices creates a view over the original array, so the memory cannot be deleted. In any, case try to put the this part inside a function like this:

def load_data():
  # load your numpy arrays 
  # x_train, x_test, y_train, y_test  
  return tf.data.Dataset.from_tensor_slices((x_train, y_train)), tf.data.Dataset.from_tensor_slices((x_test, y_test)) 

I expect that when function return any temporal variable inside the function scope will be release (including your numpy arrays), but since you mentioned that del didn't work, maybe this didn't work either. But hey, Python sometimes acts in mysterious ways.

2-> If the option 1 don't work, try to use memory mapping (https://numpy.org/doc/stable/reference/generated/numpy.memmap.html)

CodePudding user response:

according to official python documents, you can call garbage collector after delete variable. This action clears the memory of unreferenced objects

import gc
gc.collect()
  • Related