I am trying to find a memory efficient way to store data in python variables for quick access and analysis. I initialize an 2d array in numpy
and then find its memory usage (using sys
so I can compare to other variable types later) via the following:
a = np.zeros((1000,1000), dtype=np.float32)
print('The size of the numpy array is {} bytes'.format(sys.getsizeof(a)))
Which returns: The size of the numpy array is 4000112 bytes
I can move this into a dictionary of 1d numpy
arrays using the following for-loop:
b = {}
for ii in range(1000):
b[f'{ii}']=a[:,ii]
print('The size of the dictionary is {} bytes'.format(sys.getsizeof(b)))
Which returns: The size of the dictionary is 36968 bytes
. The dictionary size persists even if I delete a
and run garbage collection, so b
can't just be a container pointing to a
.
Why would a dictionary of 1d arrays take up less memory than those same arrays in an ndarray?
CodePudding user response:
There are two fundamental mistakes in your observation.
you cannot delete an object, only references. If you delete
a
you delete the pointer. When you delete all pointers then only the object might get deleted at some point by the garbage collectorsys.getsizeof
only gives the size of the container. To get the total size you need to loop over the elements and sum.
Demonstration that the size is roughly the same:
b = {}
for ii in range(1000):
b[f'{ii}']=a[:,ii].copy()
sum(sys.getsizeof(e) for e in b.values())
# 4096000