Extracting specific data from multiple DICOM files-CodePudding

I have multiple DICOM files with similar data, for example:

(0008, 0023) Content Date DA: '20200209'

(0008, 0033) Content Time TM: '192356.853736'

(0010, 0010) Patient's Name PN: 'Patient_2'

(0010, 0020) Patient ID LO: '123456'

(0018, 1151) X-Ray Tube Current IS: '640'

(0020, 0013) Instance Number IS: '97', Dataset.file_meta -------------------------------

(0002, 0002) Media Storage SOP Class UID UI: CT Image Storage

(0002, 0003) Media Storage SOP Instance UID UI: 1.2.3

(0002, 0012) Implementation Class UID UI: 1.2.3.4

From these files, I wish to extract data tags 'Patient ID', 'Patient Name', 'Instance Number' and 'Tube Current' into a DataFrame, with each Dataframe column corresponding to the data. How could I do this for multiple DICOMS in one list?

CodePudding user response：

You can easily do that using the library pydicom, for instance:

for path in dicom_paths:
    tags_list = []
    dicom = pydicom.dcmread(path, force=True)
    if hasattr(dicom, "SliceLocation") and dicom.SliceLocation:
        if "LOCALIZER" not in dicom.ImageType:
            tags = {
                "Patient Name": dicom.PatientName,
                "XRayTubeCurrent": dicom.XRayTubeCurrent,
                "InstanceNumber": dicom.InstanceNumber,
                "Series Description": dicom.SeriesDescription,
                "Image Type": "/".join(dicom.ImageType),
                "Pixel Spacing": list(dicom.PixelSpacing),
                "Rows": dicom.Rows,
                "Columns": dicom.Columns,
                "Image Orientation Patient": list(dicom.ImageOrientationPatient),
                "Series Instance UID": str(dicom.SeriesInstanceUID),
            }
    tags_list.append(tags)

where dicom_paths is a list that stores the paths of your DICOM files, for instance Path objects, where Path is imported from the pathlib library.

Eventually, you can easily convert your list of dictionaries to a pandas dataframe in the following way:

tags_df = pd.DataFrame(tags_list)

Pay attention that attributes like X-Ray Tube Current are optional (type 3 attribute).

The two if statements allow you to skip, for instance, scout images.

Use the autocompletion tool of your IDE to get all the attributes that belong to the dicom instance.

CodePudding user response：

Using pydicom with dict and list comprehensions, and guarding against missing data:

from pydicom import dcmread
from pathlib import Path
from pandas import DataFrame


folder = Path(r"C:\temp\smallCTseries")

datasets = [dcmread(fn) for fn in folder.glob("*")]
keywords = ["PatientID", "PatientName", "InstanceNumber", "TubeCurrent"]

data_dict = {
    keyword: [ds.get(keyword) for ds in datasets]
    for keyword in keywords
}

# Fix PatientName turning into tuple - pandas seems to see it as iterator
data_dict["PatientName"] = [str(x) for x in data_dict["PatientName"]]

# Construct DataFrame, with filename as row index
df = DataFrame(data_dict, index=[Path(ds.filename).name for ds in datasets])

print(df)

The ds.get() ensures that if the data element is missing, no error is raised, you simply get None.

Output using a few files from a PCIR CT set looks like:

      PatientID    PatientName InstanceNumber TubeCurrent
CT106  77654033  Doe^Archibald             18        None
CT136  77654033  Doe^Archibald            180        None
CT166  77654033  Doe^Archibald            181        None
CT196  77654033  Doe^Archibald            182        None