I'm saving numpy arrays while trying to use as little disk space as possible. Along the way I realized that saving a boolean numpy array does not improve disk usage compared to a uint8 array. Is there a reason for that or am I doing something wrong here?
Here is a minimal example:
import sys
import numpy as np
rand_array = np.random.randint(0, 2, size=(100, 100), dtype=np.uint8) # create a random dual state numpy array
array_uint8 = rand_array * 255 # array, type uint8
array_bool = np.array(rand_array, dtype=bool) # array, type bool
print(f"size array uint8 {sys.getsizeof(array_uint8)}")
# ==> size array uint8 10120
print(f"size array bool {sys.getsizeof(array_bool)}")
# ==> size array bool 10120
np.save("array_uint8", array_uint8, allow_pickle=False, fix_imports=False)
# size in fs: 10128
np.save("array_bool", array_bool, allow_pickle=False, fix_imports=False)
# size in fs: 10128
CodePudding user response:
The uint8
and bool
data types both occupy one byte of memory per element, so the arrays of equal dimensions are always going to occupy the same memory. If you are aiming to reduce your memory footprint, you can pack the boolean values as bits into a uint8 array using numpy.packbits
, thereby storing binary data in a significantly smaller array (read here)