Home > Mobile >  How to delete an entry from a numpy array in place, and retain it's initial shape?
How to delete an entry from a numpy array in place, and retain it's initial shape?

Time:10-30

Let's say, that we have a numpy array storing large objects. My goal is to delete one of these objects from memory, but retain the initial structure of the array. The cell, under which this object was stored might be filled for example with None.

Example simplified behaviour, where I replaced large objects with characters:

arr = numpy.asarray(['a', 'b', 'c']) # arr = ['a', 'b', 'c']
delete_in_place(arr, 0)              # arr = [None, 'b', 'c']

I can't do this by calling numpy.delete(), because it will just return a new array without one element, which will take additional space in memory. This will also change the shape (by getting rid of given index), which I want to avoid.

My other idea was to just set arr[0] = None and call the garbage collector, but I'm not sure what the exact behaviour of such procedure would be.

Do you have any ideas on how to do it?

CodePudding user response:

When you create a numpy array, it has a fixed size. Eventually, when you try to delete an element it will create a new numpy array.

The way you are trying to do it, that's not an effective way. Please try another library.

CodePudding user response:

You can do this with a multi-dimensional array and not even get pandas or numpy involved. You will need the assistance of the gc module and builtin del command but thats the extent of things.

For example:

import gc

with open('large-dataset.txt') as fh:
    raw_data = fh.readlines()

# parse or object creation what-not
large_objs_multidim = [obj_create(i) for i in raw_data]
...
# No longer need a reference to large object
del large_objs_multidim[0][0]
# Python doesn't make guarantees about collection read up on ref-counts.
gc.collect()

This gives the general idea on how you need to invoke the garbage collector yourself. There are some nuisances to Python's reference counting and objects in memory. I don't know the intricacies to your project and code but you might benefit from reading into __weakref__ too...

Also these link for further reading:

https://stackoverflow.com/a/1316793/1230086

https://stackoverflow.com/a/9908216/1230086

https://docs.python.org/3/library/gc.html

  • Related