I'm trying to parse an ICD10 XML file and I'm having some trouble extracting information.
<diag>
<name>A00</name>
<desc>Cholera</desc>
<diag>
<name>A00.0</name>
<desc>Cholera due to Vibrio cholerae 01, biovar cholerae</desc>
<inclusionTerm>
<note>Classical cholera</note>
<note>Classical cholera again</note>
</inclusionTerm>
</diag>
<diag>
<name>A00.1</name>
<desc>Cholera due to Vibrio cholerae 01, biovar eltor</desc>
<inclusionTerm>
<note>Cholera eltor</note>
</inclusionTerm>
</diag>
<diag>
<name>A00.9</name>
<desc>Cholera, unspecified</desc>
</diag>
</diag>
Using this:
from xml.etree import ElementTree as ET
root = ET.parse('cut.xml')
diag = root.find(".//*[name='A00.0']")
inclusionTerm = diag.find('inclusionTerm')
if inclusionTerm is not None:
print('Inclusion Term: ' diag.find('inclusionTerm').find('note').text)
the code only prints the first note inside the "inclusion Term" from A00.0 ID. How can I write the code to get all of the 'notes' inside the 'inclusionTerm'?
CodePudding user response:
An XPath expression could be written to access all note
elements:
from xml.etree import ElementTree as ET
xml = '''<diag>
<name>A00</name>
<desc>Cholera</desc>
<diag>
<name>A00.0</name>
<desc>Cholera due to Vibrio cholerae 01, biovar cholerae</desc>
<inclusionTerm>
<note>Classical cholera</note>
<note>Classical cholera again</note>
</inclusionTerm>
</diag>
<diag>
<name>A00.1</name>
<desc>Cholera due to Vibrio cholerae 01, biovar eltor</desc>
<inclusionTerm>
<note>Cholera eltor</note>
</inclusionTerm>
</diag>
<diag>
<name>A00.9</name>
<desc>Cholera, unspecified</desc>
</diag>
</diag>'''
root = ET.fromstring(xml)
notes = root.findall('.//diag[name="A00.0"]/inclusionTerm/note')
for note in notes:
print(note.text)