I would like to know if it is possible to know the shape of a disk saved numpy array without loading the array into memory. This is possible with .hdf5 files, but don't know if it is possible with .npz files. Something like:
import numpy as np
arr1 = np.arange(10000)
arr2 = np.arange(10000)
np.savez('tmp/my_arrays.npz', arr1 = arr1, arr2 = arr2)
my_arrays = np.load('tmp/my_arrays.npz')
# this loads the array into memory
my_arrays['arr1'].shape
# looking for something perhaps like
my_arrays.arr1.shape
CodePudding user response:
A npz-file is just a zip archive of arrays. It can be opened using zipfile and than methods from the numpy documentation can be used to extract only the header (shape,fortran_order,dtype).
Example
import zipfile
import numpy as np
def read_metadata(file_name):
zip_file=zipfile.ZipFile(file_name, mode='r')
arr_names=zip_file.namelist()
metadata=[]
for arr_name in arr_names:
fp=zip_file.open(arr_name,"r")
version=np.lib.format.read_magic(fp)
if version[0]==1:
shape,fortran_order,dtype=np.lib.format.read_array_header_1_0(fp)
elif version[0]==2:
shape,fortran_order,dtype=np.lib.format.read_array_header_2_0(fp)
else:
print("File format not detected!")
metadata.append((arr_name,shape,fortran_order,dtype))
fp.close()
zip_file.close()
return metadata
read_metadata('my_arrays.npz')
#[('arr1.npy', (10000,), False, dtype('int32')),
# ('arr2.npy', (10000,), False, dtype('int32'))]