Home > Software engineering >  Is it possible to save boolean numpy arrays on disk as 1bit per element with memmap support?
Is it possible to save boolean numpy arrays on disk as 1bit per element with memmap support?

Time:05-20

Is it possible to save numpy arrays on disk in boolean format where it takes only 1 bit per element? This answer suggests to use packbits and unpackbits, however from the documentation, it seems that this may not support memory mapping. Is there a way to store 1bit arays on disk with memmap support?

Reason for memmap requirement: I'm training my neural network on a database of full HD (1920x1080) images, but I crop out randomly a 256x256 patch for each iteration. Since reading the full image is time consuming, I use memmap to read the only the required patch. Now, I want to use a binary mask along with my images and hence this requirement.

CodePudding user response:

numpy does not support 1 bit per element arrays, I doubt memmap has such a feature. However, there is a simple workaround using packbits.

Since your case is not bitwise random access, you can read it as 1 byte per element array.

# A binary mask represented as an 1 byte per element array.
full_size_mask = np.random.randint(0, 2, size=[1920, 1080], dtype=np.uint8)

# Pack mask vertically.
packed_mask = np.packbits(full_size_mask, axis=0)

# Save as a memmap compatible file.
buffer = np.memmap("./temp.bin", mode='w ',
                   dtype=packed_mask.dtype, shape=packed_mask.shape)
buffer[:] = packed_mask
buffer.flush()
del buffer

# Open as a memmap file.
packed_mask = np.memmap("./temp.bin", mode='r',
                        dtype=packed_mask.dtype, shape=packed_mask.shape)

# Rect where you want to crop.
top = 555
left = 777
width = 256
height = 256

# Read the area containing the rect.
packed_top = top // 8
packed_bottom = (top   height) // 8   1
packed_patch = packed_mask[packed_top:packed_bottom, left:left   width]

# Unpack and crop the actual area.
patch_top = top - packed_top * 8
patch_mask = np.unpackbits(packed_patch, axis=0)[patch_top:patch_top   height]

# Check that the mask is cropped from the correct area.
print(np.all(patch_mask == full_size_mask[top:top   height, left:left   width]))

Note that this solution could (and likely will) read extra bits. To be specific, 7 bits maximum at both ends. In your case, it will be 7x2x256 bits, but this is only about 5% of the patch, so I believe it is negligible.

By the way, this is not an answer to your question, but when you are dealing with binary masks such as labels for image segmentation, compressing with zip may drastically reduce the file size. It is possible that it could be reduced to less than 8 KB per image (not per patch). You might want to consider this option as well.

  • Related