Let's say, that we have a numpy
array storing large objects. My goal is to delete one of these objects from memory, but retain the initial structure of the array. The cell, under which this object was stored might be filled for example with None
.
Example simplified behaviour, where I replaced large objects with characters:
arr = numpy.asarray(['a', 'b', 'c']) # arr = ['a', 'b', 'c']
delete_in_place(arr, 0) # arr = [None, 'b', 'c']
I can't do this by calling numpy.delete()
, because it will just return a new array without one element, which will take additional space in memory. This will also change the shape (by getting rid of given index), which I want to avoid.
My other idea was to just set arr[0] = None
and call the garbage collector, but I'm not sure what the exact behaviour of such procedure would be.
Do you have any ideas on how to do it?
CodePudding user response:
When you create a numpy
array, it has a fixed size. Eventually, when you try to delete an element it will create a new numpy
array.
The way you are trying to do it, that's not an effective way. Please try another library.
CodePudding user response:
You can do this with a multi-dimensional array and not even get pandas
or numpy
involved. You will need the assistance of the gc
module and builtin del
command but thats the extent of things.
For example:
import gc
with open('large-dataset.txt') as fh:
raw_data = fh.readlines()
# parse or object creation what-not
large_objs_multidim = [obj_create(i) for i in raw_data]
...
# No longer need a reference to large object
del large_objs_multidim[0][0]
# Python doesn't make guarantees about collection read up on ref-counts.
gc.collect()
This gives the general idea on how you need to invoke the garbage collector yourself. There are some nuisances to Python's reference counting and objects in memory. I don't know the intricacies to your project and code but you might benefit from reading into __weakref__
too...
Also these link for further reading:
https://stackoverflow.com/a/1316793/1230086