Home > Enterprise >  How to read all numpy files (.npy) from directory at once without For loop?
How to read all numpy files (.npy) from directory at once without For loop?

Time:10-16

I have 1970 npy files in (vid_frames) directory each npy file contains 20 frame of MSVD dataset. I need to load all these npy at once to be as tensor dataset. When I use np_read = np.load(all_npy_path) , I get this error

TypeError: expected str, bytes or os.PathLike object, not Tensor

where all_npy_path contains all npy path as tensor:

all_npy_path =
['vid_frames/m1NR0uNNs5Y_104_110.avi.npy',
 'vid_frames/9Q0JfdP36kI_23_28.avi.npy',
 'vid_frames/WTf5EgVY5uU_18_23.avi.npy',
 'vid_frames/WZTGqvbqOFE_28_34.avi.npy', ..... ]

CodePudding user response:

You must use a for loop for this, and the overhead of the loop is negligible compared to the time taken to read the data from disk.

You can use threading to speed up the process and acheive the max IO speed. but for future you might want to switch to using sqlite3 for faster IO without threading.

from multiprocessing.pool import ThreadPool
import numpy as np

all_npy_path = [
 'vid_frames/m1NR0uNNs5Y_104_110.avi.npy',
 'vid_frames/9Q0JfdP36kI_23_28.avi.npy',
 'vid_frames/WTf5EgVY5uU_18_23.avi.npy',
 'vid_frames/WZTGqvbqOFE_28_34.avi.npy',]

with ThreadPool() as pool:
    arrays_list = pool.map(np.load,all_npy_path)

note: pool.map is a for loop, it's just multithreaded to be faster.

CodePudding user response:

The following code solved the problem:

def decode_and_resize(img_path):
    tensor = tf.py_function(
        func=lambda path: np.load(path.numpy().decode("utf-8")),
        inp=[img_path],
        Tout=tf.float32
    )
    tensor.set_shape(IMAGE_SIZE_np)

    return tensor
  • Related