I have 1970 .npy
files as features for MSVD dataset. I want to create one .hdf5
file from these numpy files.
import os
import numpy as np
import hdf5
TRAIN_FEATURE_DIR = "MSVD"
for filename in os.listdir(TRAIN_FEATURE_DIR):
f = np.load(os.path.join(TRAIN_FEATURE_DIR, filename))
...
CodePudding user response:
Creating a dataset from an array is easy. Example below loops over all .npy
files in a folder and creates 1 dataset for each array. (FYI, I prefer glob.iglob()
to get the filenames using a wildcard.) Dataset name is the same as the filename.
import glob
import numpy as np
import h5py
with h5py.File('SO_74788877.h5','w') as h5f:
for filename in glob.iglob('*.npy'):
arr = np.load(filename)
h5f.create_dataset(filename,data=arr)
This code shows how to access the dataset names and values from the H5 file created above. (dataset
is a dataset object which behaves like a numpy array in many instances):
with h5py.File('SO_74788877.h5','r') as h5f:
for name, dataset in h5f.items():
print(name, dataset.shape, dataset.dtype)
CodePudding user response:
The following code solved my problem:
import os
import numpy as np
import h5py
TRAIN_FEATURE_DIR = "MSVD" # MSVD ==> numpy folder path
h5 = h5py.File("out.hdf5", 'w') # out ==> output hdf5 file name
for filename in os.listdir(TRAIN_FEATURE_DIR):
video_id = os.path.splitext(filename)[0] # optional, to remove '.npy'
video_id = video_id.split('.')[0] # optional, to remove '.avi' from video_id
f = np.load(os.path.join(TRAIN_FEATURE_DIR, filename))
h5[video_id] = f
h5.close()