Home > Blockchain >  How to read multiple .mat file from a folder in python?
How to read multiple .mat file from a folder in python?

Time:01-09

I am trying to read multiple .mat files in python. Every time I get the error. This is my code:

folder = "C:/Users/Sreeraj/Desktop/Me/PhD/Mahindra/brain_tumor_dataset/data/"
directs = sorted(listdir(folder))
labels = []
for file in directs:
    f = h5py.File(folder file,'r')
    label = np.array(f.get("cjdata/label"))[0][0]
    labels.append(label)
labels = pd.Series(labels)
labels.shape

The error I am getting is:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-11-e7d73f54f73d> in <module>
      3 labels = []
      4 for file in directs:
----> 5     f = h5py.File(folder file,'r')
      6     label = np.array(f.get("cjdata/label"))[0][0]
      7     labels.append(label)

~\miniconda3\envs\tensorflow\lib\site-packages\h5py\_hl\files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
    404             with phil:
    405                 fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
--> 406                 fid = make_fid(name, mode, userblock_size,
    407                                fapl, fcpl=make_fcpl(track_order=track_order),
    408                                swmr=swmr)

~\miniconda3\envs\tensorflow\lib\site-packages\h5py\_hl\files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    171         if swmr and swmr_support:
    172             flags |= h5f.ACC_SWMR_READ
--> 173         fid = h5f.open(name, flags, fapl=fapl)
    174     elif mode == 'r ':
    175         fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\h5f.pyx in h5py.h5f.open()

OSError: Unable to open file (file signature not found)

I have 5849 mat files. Can anyone tell me where I am going wrong?

I used h5py to read mat files. I wanted to read the labels and images in each .mat files.

CodePudding user response:

I believe the issue is in concatenating folder file. 2 things about that:

  1. The word file is a python keyword, so you shouldn't use it as a variable name.
  2. Assuming you used os.listdir here (you didn't attach the import itself), your concatenation of folder and file is missing a slash. enter image description here

A fix for that (after I renamed file to filename):

full_file_path = os.path.join(folder, filename)
f = h5py.File(full_file_path,'r')

CodePudding user response:

I here are 4 areas where the code could be improved:

  1. I prefer glob.iglob() method to get a list of files. It can use a wildcard to define the filenames, and is a generator. That way you don't have to create a list with 5849 mat filenames.
  2. You open the file with h5py.File(), but don't close it. That probably won't cause a problem, but is bad practice. It's better to use Python's with/as: context manager. (If you don't do that, add f.close() inside the loop).
  3. You are using the dataset .get() method to retrieve the dataset object. That method has been deprecated for quite some time. Documented practice is to reference the dataset name like this f["cjdata/label"]
  4. Also, you added [0][0] after the dataset object. Are you sure you want to do that? They are indices that will access the dataset value at index=[0][0]. If you want to create a numpy array of the dataset values, use label = f["cjdata/label"][()]

Modified code that demonstrates all of these changes below:

folder = "C:/Users/Sreeraj/Desktop/Me/PhD/Mahindra/brain_tumor_dataset/data/"
file_wc = folder   "*.mat"  # assumes filename extension is .mat
labels = []
for fname in glob.iglob(file_wc):
    with h5py.File(fname,'r') as f:
        # dataset .get() method deprecated, line below updated appropriately:
        label = np.array(f["cjdata/label"][0][0])
        #or maybe just:
        label = f["cjdata/label"][()]
        labels.append(label)
labels = pd.Series(labels)
labels.shape
  • Related