I have multiple DICOM files with similar data, for example:
(0008, 0023) Content Date DA: '20200209'
(0008, 0033) Content Time TM: '192356.853736'
(0010, 0010) Patient's Name PN: 'Patient_2'
(0010, 0020) Patient ID LO: '123456'
(0018, 1151) X-Ray Tube Current IS: '640'
(0020, 0013) Instance Number IS: '97', Dataset.file_meta -------------------------------
(0002, 0002) Media Storage SOP Class UID UI: CT Image Storage
(0002, 0003) Media Storage SOP Instance UID UI: 1.2.3
(0002, 0012) Implementation Class UID UI: 1.2.3.4
From these files, I wish to extract data tags 'Patient ID', 'Patient Name', 'Instance Number' and 'Tube Current' into a DataFrame, with each Dataframe column corresponding to the data. How could I do this for multiple DICOMS in one list?
CodePudding user response:
You can easily do that using the library pydicom
, for instance:
for path in dicom_paths:
tags_list = []
dicom = pydicom.dcmread(path, force=True)
if hasattr(dicom, "SliceLocation") and dicom.SliceLocation:
if "LOCALIZER" not in dicom.ImageType:
tags = {
"Patient Name": dicom.PatientName,
"XRayTubeCurrent": dicom.XRayTubeCurrent,
"InstanceNumber": dicom.InstanceNumber,
"Series Description": dicom.SeriesDescription,
"Image Type": "/".join(dicom.ImageType),
"Pixel Spacing": list(dicom.PixelSpacing),
"Rows": dicom.Rows,
"Columns": dicom.Columns,
"Image Orientation Patient": list(dicom.ImageOrientationPatient),
"Series Instance UID": str(dicom.SeriesInstanceUID),
}
tags_list.append(tags)
where dicom_paths
is a list that stores the paths of your DICOM files, for instance Path
objects, where Path
is imported from the pathlib
library.
Eventually, you can easily convert your list of dictionaries to a pandas
dataframe in the following way:
tags_df = pd.DataFrame(tags_list)
Pay attention that attributes like X-Ray Tube Current
are optional (type 3 attribute).
The two if
statements allow you to skip, for instance, scout images.
Use the autocompletion tool of your IDE to get all the attributes that belong to the dicom
instance.
CodePudding user response:
Using pydicom with dict and list comprehensions, and guarding against missing data:
from pydicom import dcmread
from pathlib import Path
from pandas import DataFrame
folder = Path(r"C:\temp\smallCTseries")
datasets = [dcmread(fn) for fn in folder.glob("*")]
keywords = ["PatientID", "PatientName", "InstanceNumber", "TubeCurrent"]
data_dict = {
keyword: [ds.get(keyword) for ds in datasets]
for keyword in keywords
}
# Fix PatientName turning into tuple - pandas seems to see it as iterator
data_dict["PatientName"] = [str(x) for x in data_dict["PatientName"]]
# Construct DataFrame, with filename as row index
df = DataFrame(data_dict, index=[Path(ds.filename).name for ds in datasets])
print(df)
The ds.get()
ensures that if the data element is missing, no error is raised, you simply get None
.
Output using a few files from a PCIR CT set looks like:
PatientID PatientName InstanceNumber TubeCurrent
CT106 77654033 Doe^Archibald 18 None
CT136 77654033 Doe^Archibald 180 None
CT166 77654033 Doe^Archibald 181 None
CT196 77654033 Doe^Archibald 182 None