How to read metadata of npz file without loading it into memory-CodePudding

I would like to know if it is possible to know the shape of a disk saved numpy array without loading the array into memory. This is possible with .hdf5 files, but don't know if it is possible with .npz files. Something like:

import numpy as np

arr1 = np.arange(10000)
arr2 = np.arange(10000)

np.savez('tmp/my_arrays.npz', arr1 = arr1, arr2 = arr2)

my_arrays = np.load('tmp/my_arrays.npz')

# this loads the array into memory
my_arrays['arr1'].shape

# looking for something perhaps like
my_arrays.arr1.shape

CodePudding user response：

A npz-file is just a zip archive of arrays. It can be opened using zipfile and than methods from the numpy documentation can be used to extract only the header (shape,fortran_order,dtype).

Example

import zipfile
import numpy as np

def read_metadata(file_name):
    zip_file=zipfile.ZipFile(file_name, mode='r')
    arr_names=zip_file.namelist()

    metadata=[]
    for arr_name in arr_names:
        fp=zip_file.open(arr_name,"r")
        version=np.lib.format.read_magic(fp)

        if version[0]==1:
            shape,fortran_order,dtype=np.lib.format.read_array_header_1_0(fp)
        elif version[0]==2:
            shape,fortran_order,dtype=np.lib.format.read_array_header_2_0(fp)
        else:
            print("File format not detected!")
        metadata.append((arr_name,shape,fortran_order,dtype))
        fp.close()
    zip_file.close()
    return metadata

read_metadata('my_arrays.npz')
#[('arr1.npy', (10000,), False, dtype('int32')),
# ('arr2.npy', (10000,), False, dtype('int32'))]