Home > Net >  Parsing XML: How can I get all the information from lines with same name but different text in XML f
Parsing XML: How can I get all the information from lines with same name but different text in XML f

Time:02-10

I'm trying to parse an ICD10 XML file and I'm having some trouble extracting information.

<diag>
<name>A00</name>
<desc>Cholera</desc>
<diag>
  <name>A00.0</name>
  <desc>Cholera due to Vibrio cholerae 01, biovar cholerae</desc>
  <inclusionTerm>
    <note>Classical cholera</note>
    <note>Classical cholera again</note>
  </inclusionTerm>
</diag>
<diag>
  <name>A00.1</name>
  <desc>Cholera due to Vibrio cholerae 01, biovar eltor</desc>
  <inclusionTerm>
    <note>Cholera eltor</note>
  </inclusionTerm>
</diag>
<diag>
  <name>A00.9</name>
  <desc>Cholera, unspecified</desc>
</diag>
</diag>

Using this:

from xml.etree import ElementTree as ET
root = ET.parse('cut.xml')
diag = root.find(".//*[name='A00.0']")
inclusionTerm = diag.find('inclusionTerm')
if inclusionTerm is not None:
    print('Inclusion Term: ' diag.find('inclusionTerm').find('note').text)

the code only prints the first note inside the "inclusion Term" from A00.0 ID. How can I write the code to get all of the 'notes' inside the 'inclusionTerm'?

CodePudding user response:

An XPath expression could be written to access all note elements:

from xml.etree import ElementTree as ET

xml = '''<diag>
<name>A00</name>
<desc>Cholera</desc>
<diag>
  <name>A00.0</name>
  <desc>Cholera due to Vibrio cholerae 01, biovar cholerae</desc>
  <inclusionTerm>
    <note>Classical cholera</note>
    <note>Classical cholera again</note>
  </inclusionTerm>
</diag>
<diag>
  <name>A00.1</name>
  <desc>Cholera due to Vibrio cholerae 01, biovar eltor</desc>
  <inclusionTerm>
    <note>Cholera eltor</note>
  </inclusionTerm>
</diag>
<diag>
  <name>A00.9</name>
  <desc>Cholera, unspecified</desc>
</diag>
</diag>'''

root = ET.fromstring(xml)

notes = root.findall('.//diag[name="A00.0"]/inclusionTerm/note')

for note in notes:
  print(note.text)
  • Related