I am trying to read multiple .mat files in python. Every time I get the error. This is my code:
folder = "C:/Users/Sreeraj/Desktop/Me/PhD/Mahindra/brain_tumor_dataset/data/"
directs = sorted(listdir(folder))
labels = []
for file in directs:
f = h5py.File(folder file,'r')
label = np.array(f.get("cjdata/label"))[0][0]
labels.append(label)
labels = pd.Series(labels)
labels.shape
The error I am getting is:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-11-e7d73f54f73d> in <module>
3 labels = []
4 for file in directs:
----> 5 f = h5py.File(folder file,'r')
6 label = np.array(f.get("cjdata/label"))[0][0]
7 labels.append(label)
~\miniconda3\envs\tensorflow\lib\site-packages\h5py\_hl\files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
404 with phil:
405 fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
--> 406 fid = make_fid(name, mode, userblock_size,
407 fapl, fcpl=make_fcpl(track_order=track_order),
408 swmr=swmr)
~\miniconda3\envs\tensorflow\lib\site-packages\h5py\_hl\files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
171 if swmr and swmr_support:
172 flags |= h5f.ACC_SWMR_READ
--> 173 fid = h5f.open(name, flags, fapl=fapl)
174 elif mode == 'r ':
175 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\h5f.pyx in h5py.h5f.open()
OSError: Unable to open file (file signature not found)
I have 5849 mat files. Can anyone tell me where I am going wrong?
I used h5py to read mat files. I wanted to read the labels and images in each .mat files.
CodePudding user response:
I believe the issue is in concatenating folder file
.
2 things about that:
- The word
file
is a python keyword, so you shouldn't use it as a variable name. - Assuming you used
os.listdir
here (you didn't attach the import itself), your concatenation of folder and file is missing a slash.
A fix for that (after I renamed file
to filename
):
full_file_path = os.path.join(folder, filename)
f = h5py.File(full_file_path,'r')
CodePudding user response:
I here are 4 areas where the code could be improved:
- I prefer
glob.iglob()
method to get a list of files. It can use a wildcard to define the filenames, and is a generator. That way you don't have to create a list with 5849 mat filenames. - You open the file with
h5py.File()
, but don't close it. That probably won't cause a problem, but is bad practice. It's better to use Python'swith/as:
context manager. (If you don't do that, addf.close()
inside the loop). - You are using the dataset
.get()
method to retrieve the dataset object. That method has been deprecated for quite some time. Documented practice is to reference the dataset name like thisf["cjdata/label"]
- Also, you added
[0][0]
after the dataset object. Are you sure you want to do that? They are indices that will access the dataset value at index=[0][0]
. If you want to create a numpy array of the dataset values, use label = f["cjdata/label"][()]
Modified code that demonstrates all of these changes below:
folder = "C:/Users/Sreeraj/Desktop/Me/PhD/Mahindra/brain_tumor_dataset/data/"
file_wc = folder "*.mat" # assumes filename extension is .mat
labels = []
for fname in glob.iglob(file_wc):
with h5py.File(fname,'r') as f:
# dataset .get() method deprecated, line below updated appropriately:
label = np.array(f["cjdata/label"][0][0])
#or maybe just:
label = f["cjdata/label"][()]
labels.append(label)
labels = pd.Series(labels)
labels.shape