Home > Software design >  How to parse XML with xml.Etree.ElementTree?
How to parse XML with xml.Etree.ElementTree?

Time:06-07

I am trying to parse a XML with xml.etree.Elementree, but I am not able to extract the information "ArticleTitle" and "DescriptorName" into a variable respectively into a list of strings. I debugged the code and root is always empty. It seems like the code find the tags but not the content I try to fetch.

Furthermore the code below indicates that I am just trying to fetch the text of the tag "DescriptorName" but in the second step I also need the "UI" (for example: "D000368") of a certain DescriptorName tag. I'd appreciate any hints how to access these values, too.

import xml.etree.ElementTree as ET

# efetch function is from a module which queries pubmed API. Content of response of the API is a XML you can find below in the second code snippet.
response = efetch(['35590280', '35590281'])

root = ET.fromstring(response.content)

for article in root.findall('PubmedArticle'):
    article_title = article.find('ArticleTitle')
    meshcodes = article.findall('DescriptorName')
    print(article_title, meshcodes)

Output console:

None []
None []

As per my understanding the root should be <PubmedArticleSet\> and child should be <PubmedArticle\>. One <PubmedArticleSet\> can consist of hundreds of <PubmedArticle\>s.

<?xml version="1.0" ?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2019//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd">
<PubmedArticleSet>
    <PubmedArticle>
        <MedlineCitation Status="MEDLINE" IndexingMethod="Automated" Owner="NLM">
            <PMID Version="1">35590280</PMID>
            <DateCompleted>
                <Year>2022</Year>
                <Month>05</Month>
                <Day>23</Day>
            </DateCompleted>
            <DateRevised>
                <Year>2022</Year>
                <Month>05</Month>
                <Day>23</Day>
            </DateRevised>
            <Article PubModel="Electronic">
                <Journal>
                    <ISSN IssnType="Electronic">1471-2474</ISSN>
                    <JournalIssue CitedMedium="Internet">
                        <Volume>23</Volume>
                        <Issue>1</Issue>
                        <PubDate>
                            <Year>2022</Year>
                            <Month>May</Month>
                            <Day>19</Day>
                        </PubDate>
                    </JournalIssue>
                    <Title>BMC musculoskeletal disorders</Title>
                    <ISOAbbreviation>BMC Musculoskelet Disord</ISOAbbreviation>
                </Journal>
                <ArticleTitle>Risk factors of fracture following curettage for bone giant cell tumors of the extremities.</ArticleTitle>
                <Pagination>
                    <MedlinePgn>477</MedlinePgn>
                </Pagination>
                <ELocationID EIdType="doi" ValidYN="Y">10.1186/s12891-022-05447-x</ELocationID>
                <Abstract>
                    <AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Following curettage of giant cell tumor of bone (GCTB), it is common to fill the cavity with polymethylmethacrylate (PMMA) bone cement, bone allograft, or artificial bone to maintain bone strength; however, there is a 2-14% risk of postoperative fractures. We conducted this retrospective study to clarify the risk factors for fractures after curettage for GCTB of the extremities.</AbstractText>
                    <AbstractText Label="METHODS" NlmCategory="METHODS">This study included 284 patients with GCTBs of the extremities who underwent curettage at our institutions between 1980 and 2018 after excluding patients whose cavities were not filled with anything or who had additional plate fixation. The tumor cavity was filled with PMMA bone cement alone (n = 124), PMMA bone cement and bone allograft (n = 81), bone allograft alone (n = 63), or hydroxyapatite graft alone (n = 16).</AbstractText>
                    <AbstractText Label="RESULTS" NlmCategory="RESULTS">Fractures after curettage occurred in 10 (3.5%) patients, and the median time from the curettage to fracture was 3.5 months (interquartile range [IQR], 1.8-8.3 months). The median postoperative follow-up period was 86.5 months (IQR, 50.3-118.8 months). On univariate analysis, patients who had GCTB of the proximal or distal femur (1-year fracture-free survival, 92.5%; 95% confidence interval [CI]: 85.8-96.2) presented a higher risk for postoperative fracture than those who had GCTB at another site (100%; p = 0.0005). Patients with a pathological fracture at presentation (1-year fracture-free survival, 88.2%; 95% CI: 63.2-97.0) presented a higher risk for postoperative fracture than those without a pathological fracture at presentation (97.8%; 95% CI: 95.1-99.0; p = 0.048). Patients who received bone grafting (1-year fracture-free survival, 99.4%; 95% CI: 95.7-99.9) had a lower risk of postoperative fracture than those who did not receive bone grafting (94.4%; 95% CI: 88.7-97.3; p = 0.003).</AbstractText>
                    <AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">For GCTBs of the femur, especially those with pathological fracture at presentation, bone grafting after curettage is recommended to reduce the risk of postoperative fracture. Additional plate fixation should be considered when curettage and cement filling without bone grafting are performed in patients with GCTB of the femur. This should be specially performed for those patients with a pathological fracture at presentation.</AbstractText>
                    <CopyrightInformation>© 2022. The Author(s).</CopyrightInformation>
                </Abstract>
                <AuthorList CompleteYN="Y">
                    <Author ValidYN="Y">
                        <LastName>Tsukamoto</LastName>
                        <ForeName>Shinji</ForeName>
                        <Initials>S</Initials>
                        <AffiliationInfo>
                            <Affiliation>Department of Orthopaedic Surgery, Nara Medical University, 840g, Shijo-cho, Kashihara-city, Nara, 634-8521, Japan. [email protected].</Affiliation>
                        </AffiliationInfo>
                    </Author>
                    <Author ValidYN="Y">
                        <LastName>Mavrogenis</LastName>
                        <ForeName>Andreas F</ForeName>
                        <Initials>AF</Initials>
                        <AffiliationInfo>
                            <Affiliation>First Department of Orthopaedics, National and Kapodistrian University of Athens, School of Medicine, 41 Ventouri Street, 15562 Holargos, Athens, Greece.</Affiliation>
                        </AffiliationInfo>
                    </Author>
                    <Author ValidYN="Y">
                        <LastName>Akahane</LastName>
                        <ForeName>Manabu</ForeName>
                        <Initials>M</Initials>
                        <AffiliationInfo>
                            <Affiliation>Department of Health and Welfare Services, National Institute of Public Health, 2-3-6 Minami, Wako-shi, Saitama, 351-0197, Japan.</Affiliation>
                        </AffiliationInfo>
                    </Author>
                    <Author ValidYN="Y">
                        <LastName>Honoki</LastName>
                        <ForeName>Kanya</ForeName>
                        <Initials>K</Initials>
                        <AffiliationInfo>
                            <Affiliation>Department of Orthopaedic Surgery, Nara Medical University, 840g, Shijo-cho, Kashihara-city, Nara, 634-8521, Japan.</Affiliation>
                        </AffiliationInfo>
                    </Author>
                    <Author ValidYN="Y">
                        <LastName>Kido</LastName>
                        <ForeName>Akira</ForeName>
                        <Initials>A</Initials>
                        <AffiliationInfo>
                            <Affiliation>Department of Rehabilitation Medicine, Nara Medical University, 840, Shijo-cho, Kashihara-city, Nara, 634-8521, Japan.</Affiliation>
                        </AffiliationInfo>
                    </Author>
                    <Author ValidYN="Y">
                        <LastName>Tanaka</LastName>
                        <ForeName>Yasuhito</ForeName>
                        <Initials>Y</Initials>
                        <AffiliationInfo>
                            <Affiliation>Department of Orthopaedic Surgery, Nara Medical University, 840g, Shijo-cho, Kashihara-city, Nara, 634-8521, Japan.</Affiliation>
                        </AffiliationInfo>
                    </Author>
                    <Author ValidYN="Y">
                        <LastName>Donati</LastName>
                        <ForeName>Davide Maria</ForeName>
                        <Initials>DM</Initials>
                        <AffiliationInfo>
                            <Affiliation>Department of Orthopaedic Oncology, IRCCS Istituto Ortopedico Rizzoli, Via Pupilli 1, 40136, Bologna, Italy.</Affiliation>
                        </AffiliationInfo>
                    </Author>
                    <Author ValidYN="Y">
                        <LastName>Errani</LastName>
                        <ForeName>Costantino</ForeName>
                        <Initials>C</Initials>
                        <AffiliationInfo>
                            <Affiliation>Department of Orthopaedic Oncology, IRCCS Istituto Ortopedico Rizzoli, Via Pupilli 1, 40136, Bologna, Italy.</Affiliation>
                        </AffiliationInfo>
                    </Author>
                </AuthorList>
                <Language>eng</Language>
                <PublicationTypeList>
                    <PublicationType UI="D016428">Journal Article</PublicationType>
                </PublicationTypeList>
                <ArticleDate DateType="Electronic">
                    <Year>2022</Year>
                    <Month>05</Month>
                    <Day>19</Day>
                </ArticleDate>
            </Article>
            <MedlineJournalInfo>
                <Country>England</Country>
                <MedlineTA>BMC Musculoskelet Disord</MedlineTA>
                <NlmUniqueID>100968565</NlmUniqueID>
                <ISSNLinking>1471-2474</ISSNLinking>
            </MedlineJournalInfo>
            <ChemicalList>
                <Chemical>
                    <RegistryNumber>0</RegistryNumber>
                    <NameOfSubstance UI="D001843">Bone Cements</NameOfSubstance>
                </Chemical>
                <Chemical>
                    <RegistryNumber>9011-14-7</RegistryNumber>
                    <NameOfSubstance UI="D019904">Polymethyl Methacrylate</NameOfSubstance>
                </Chemical>
            </ChemicalList>
            <CitationSubset>IM</CitationSubset>
            <MeshHeadingList>
                <MeshHeading>
                    <DescriptorName UI="D001843" MajorTopicYN="N">Bone Cements</DescriptorName>
                    <QualifierName UI="Q000009" MajorTopicYN="N">adverse effects</QualifierName>
                </MeshHeading>
                <MeshHeading>
                    <DescriptorName UI="D001859" MajorTopicYN="Y">Bone Neoplasms</DescriptorName>
                    <QualifierName UI="Q000473" MajorTopicYN="N">pathology</QualifierName>
                    <QualifierName UI="Q000601" MajorTopicYN="N">surgery</QualifierName>
                </MeshHeading>
                <MeshHeading>
                    <DescriptorName UI="D003475" MajorTopicYN="N">Curettage</DescriptorName>
                    <QualifierName UI="Q000009" MajorTopicYN="N">adverse effects</QualifierName>
                </MeshHeading>
                <MeshHeading>
                    <DescriptorName UI="D005121" MajorTopicYN="N">Extremities</DescriptorName>
                    <QualifierName UI="Q000473" MajorTopicYN="N">pathology</QualifierName>
                </MeshHeading>
                <MeshHeading>
                    <DescriptorName UI="D005598" MajorTopicYN="Y">Fractures, Spontaneous</DescriptorName>
                    <QualifierName UI="Q000000981" MajorTopicYN="N">diagnostic imaging</QualifierName>
                    <QualifierName UI="Q000453" MajorTopicYN="N">epidemiology</QualifierName>
                    <QualifierName UI="Q000209" MajorTopicYN="N">etiology</QualifierName>
                </MeshHeading>
                <MeshHeading>
                    <DescriptorName UI="D018212" MajorTopicYN="Y">Giant Cell Tumor of Bone</DescriptorName>
                    <QualifierName UI="Q000473" MajorTopicYN="N">pathology</QualifierName>
                    <QualifierName UI="Q000601" MajorTopicYN="N">surgery</QualifierName>
                </MeshHeading>
                <MeshHeading>
                    <DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
                </MeshHeading>
                <MeshHeading>
                    <DescriptorName UI="D009364" MajorTopicYN="N">Neoplasm Recurrence, Local</DescriptorName>
                    <QualifierName UI="Q000601" MajorTopicYN="N">surgery</QualifierName>
                </MeshHeading>
                <MeshHeading>
                    <DescriptorName UI="D019904" MajorTopicYN="N">Polymethyl Methacrylate</DescriptorName>
                </MeshHeading>
                <MeshHeading>
                    <DescriptorName UI="D012189" MajorTopicYN="N">Retrospective Studies</DescriptorName>
                </MeshHeading>
                <MeshHeading>
                    <DescriptorName UI="D012307" MajorTopicYN="N">Risk Factors</DescriptorName>
                </MeshHeading>
            </MeshHeadingList>
            <KeywordList Owner="NOTNLM">
                <Keyword MajorTopicYN="N">Bone grafting</Keyword>
                <Keyword MajorTopicYN="N">Cement</Keyword>
                <Keyword MajorTopicYN="N">Curettage</Keyword>
                <Keyword MajorTopicYN="N">Denosumab</Keyword>
                <Keyword MajorTopicYN="N">Fracture</Keyword>
                <Keyword MajorTopicYN="N">Giant cell tumor of bone</Keyword>
            </KeywordList>
        </MedlineCitation>
        <PubmedData>
            <History>
                <PubMedPubDate PubStatus="received">
                    <Year>2021</Year>
                    <Month>10</Month>
                    <Day>18</Day>
                </PubMedPubDate>
                <PubMedPubDate PubStatus="accepted">
                    <Year>2022</Year>
                    <Month>05</Month>
                    <Day>17</Day>
                </PubMedPubDate>
                <PubMedPubDate PubStatus="entrez">
                    <Year>2022</Year>
                    <Month>5</Month>
                    <Day>19</Day>
                    <Hour>23</Hour>
                    <Minute>45</Minute>
                </PubMedPubDate>
                <PubMedPubDate PubStatus="pubmed">
                    <Year>2022</Year>
                    <Month>5</Month>
                    <Day>20</Day>
                    <Hour>6</Hour>
                    <Minute>0</Minute>
                </PubMedPubDate>
                <PubMedPubDate PubStatus="medline">
                    <Year>2022</Year>
                    <Month>5</Month>
                    <Day>24</Day>
                    <Hour>6</Hour>
                    <Minute>0</Minute>
                </PubMedPubDate>
            </History>
            <PublicationStatus>epublish</PublicationStatus>
            <ArticleIdList>
                <ArticleId IdType="pubmed">35590280</ArticleId>
                <ArticleId IdType="doi">10.1186/s12891-022-05447-x</ArticleId>
                <ArticleId IdType="pii">10.1186/s12891-022-05447-x</ArticleId>
                <ArticleId IdType="pmc">PMC9118605</ArticleId>
            </ArticleIdList>
            <ReferenceList>
                <Reference>
                    <Citation>Clin Orthop Relat Res. 2020 May;478(5):1076-1085</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">31794487</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Bone Joint Surg Am. 2018 Mar 21;100(6):496-504</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">29557866</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Surg Oncol. 2019 Jun;119(7):864-872</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">30734307</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Clin Orthop Relat Res. 2007 Jun;459:96-104</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">17417093</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Pediatr Orthop. 2014 Jan;34(1):92-100</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">23812148</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Indian J Orthop. 2007 Apr;41(2):109-14</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">21139761</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Orthopedics. 2020 Sep 1;43(5):284-291</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">32745221</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Am J Orthop (Belle Mead NJ). 2011 Jun;40(6):E105-9</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">21869943</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Orthop Res. 1989;7(4):579-84</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">2544712</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Biomed Mater Res. 2002 Mar 5;59(3):490-8</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">11774307</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Comput Biol Med. 2019 Sep;112:103360</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">31330318</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Bone Joint Surg Am. 2014 Mar 5;96(5):e35</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">24599207</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Eur J Orthop Surg Traumatol. 2017 Aug;27(6):813-819</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">28589498</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Bone Joint Surg Am. 2013 Nov 6;95(21):e159</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">24196471</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Orthopedics. 2014 Mar;37(3):158-62</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">24762144</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Clin Orthop Relat Res. 2013 Mar;471(3):820-9</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">22926445</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Clin Orthop Relat Res. 2017 Mar;475(3):776-783</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">26932739</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Sci Rep. 2020 Dec 7;10(1):21319</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">33288803</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Surg Oncol. 2021 Apr;123(5):1299-1303</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">33524202</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Arch Orthop Trauma Surg (1978). 1982;100(1):3-10</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">7125872</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Int Orthop. 2006 Apr;30(2):135-8</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">16474936</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Int Orthop. 2018 Jan;42(1):203-213</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">28988294</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Eur J Orthop Surg Traumatol. 2020 Jan;30(1):3-9</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">31520122</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Bone Joint Surg Am. 1987 Jan;69(1):106-14</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">3805057</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Clin Orthop Relat Res. 2007 May;458:159-67</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">17290156</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Lancet Oncol. 2013 Aug;14(9):901-8</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">23867211</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Orthop Traumatol. 2016 Sep;17(3):249-54</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">26883439</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Bone Joint Surg Am. 1994 Dec;76(12):1827-33</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">7989388</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Acta Orthop. 2009 Feb;80(1):4-8</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">19234881</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>J Orthop Res. 2002 May;20(3):464-72</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">12038619</ArticleId>
                    </ArticleIdList>
                </Reference>
                <Reference>
                    <Citation>Oncol Lett. 2013 May;5(5):1595-1598</Citation>
                    <ArticleIdList>
                        <ArticleId IdType="pubmed">23760940</ArticleId>
                    </ArticleIdList>
                </Reference>
            </ReferenceList>
        </PubmedData>
    </PubmedArticle>
    <PubmedArticle>
    ...
    </PubmedArticle>
</PubmedArticleSet>

CodePudding user response:

The final solution for the loop to access all the information I need is as follows:

for article in root.findall('PubmedArticle'):
    article_title = article.find("MedlineCitation//ArticleTitle").text
    meshcodes = [code.text for code in article.findall('MedlineCitation//DescriptorName')]
    meshcodes_ui = [code.attrib['UI'] for code in article.findall(
        'MedlineCitation//DescriptorName')]
    print(article_title, meshcodes, meshcodes_ui)

Kudos to @larsks and @Alberto Hanna

CodePudding user response:

A PubmedArticle element doesn't contain an ArticleTitle element. The ArticleTitle element is in PubmedArticle/MedlineCitation/Article. So you could write:

for article in root.findall("PubmedArticle"):
    article_title = article.find("MedlineCitation").find("Article").find("ArticleTitle")
    print(article_title.text)

Or you could use an xpath expression:

for article in root.findall("PubmedArticle"):
    article_title = article.find("MedlineCitation//ArticleTitle")
    print(article_title.text)

CodePudding user response:

I think you're looking for something like this

import xml.etree.ElementTree as ET

tree = ET.parse('xx.xml')
root = tree.getroot()

for article in root.findall(".//PubmedArticle"):
    article_title = article.find(".//ArticleTitle")
    print(article_title.text)

for descriptor_name in root.findall('.//DescriptorName'):
    if descriptor_name.text == 'Bone Cements':
        print(descriptor_name.get('UI'))
  • Related