Findings | Impression | File_name_Location |
---|---|---|
Lung bases: No pulmonary nodules or evidence of pneumonia | No findings on the current CT to account for the patient's clinical complaint of abdominal pain. | /home/text_file/p123456.txt |
I have a pandas dataframe with 3 columns (from chest-Xray report) the columns are "findings", "impression" and "file_Name" with directory information. I have have separate directory (folders) of chest-Xray images that i have to crawl through to get the matching "file_Name" (becuase, there are more image files in the directory, than in my text dataframe)from image directory and put in the same row of above dataframe, and the image file name should be matched with the text file name.
need for the code to solve this.
An example of image file directory is as below:
/home/files/f1/images/i123456.jpg
there are folder from f1 to f25 and each having hundreds of .jpg file.
Update: Corralien's code raised an exception:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:3803, in Index.get_loc(self, key, method, tolerance)
3802 try:
-> 3803 return self._engine.get_loc(casted_key)
3804 except KeyError as err:
File ~/miniconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File ~/miniconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'File_name_Location'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[79], line 9
6 file = f"{img.stem[1:]}.txt"
7 images[file] = str(img)
----> 9 df['Image_name_Location']=df['File_name_Location'].str.split('/').str[-1].map(images)
File ~/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py:3805, in DataFrame.__getitem__(self, key)
3803 if self.columns.nlevels > 1:
3804 return self._getitem_multilevel(key)
-> 3805 indexer = self.columns.get_loc(key)
3806 if is_integer(indexer):
3807 indexer = [indexer]
File ~/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key, method, tolerance)
3803 return self._engine.get_loc(casted_key)
3804 except KeyError as err:
-> 3805 raise KeyError(key) from err
3806 except TypeError:
3807 # If we have a listlike key, _check_indexing_error will raise
3808 # InvalidIndexError. Otherwise we fall through and re-raise
3809 # the TypeError.
3810 self._check_indexing_error(key)
KeyError: 'File_name_Location'
CodePudding user response:
IIUC, there is a relation between text and image files: p123456.txt -> f??/images/i123456.jpg
.
You can use the following code:
# create an index of your images with the above relation
images = {}
for img in pathlib.Path('/home/files').glob('f*/images/*.jpg'):
file = f"p{img.stem[1:]}.txt"
images[file] = str(img)
df['Image_name_Location']=df['File_name_Location'].str.split('/').str[-1].map(images)
Output:
>>> df
File_name_Location Image_name_Location
0 /home/text_file/p123456.txt /home/files/f1/images/i123456.jpg
1 home/text_file/p987654.txt /home/files/f22/images/i987654.jpg
CodePudding user response:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:3803, in Index.get_loc(self, key, method, tolerance)
3802 try:
-> 3803 return self._engine.get_loc(casted_key)
3804 except KeyError as err:
File ~/miniconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File ~/miniconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'File_name_Location'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[79], line 9
6 file = f"{img.stem[1:]}.txt"
7 images[file] = str(img)
----> 9 df['Image_name_Location']=df['File_name_Location'].str.split('/').str[-1].map(images)
File ~/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py:3805, in DataFrame.__getitem__(self, key)
3803 if self.columns.nlevels > 1:
3804 return self._getitem_multilevel(key)
-> 3805 indexer = self.columns.get_loc(key)
3806 if is_integer(indexer):
3807 indexer = [indexer]
File ~/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key, method, tolerance)
3803 return self._engine.get_loc(casted_key)
3804 except KeyError as err:
-> 3805 raise KeyError(key) from err
3806 except TypeError:
3807 # If we have a listlike key, _check_indexing_error will raise
3808 # InvalidIndexError. Otherwise we fall through and re-raise
3809 # the TypeError.
3810 self._check_indexing_error(key)
KeyError: 'File_name_Location'