I have a requirement where I have to extract XML data using lxml and xpath. I need to extract EventId = 122157660 using lxml and xpath.
<B2B_DATA>
<B2B_METADATA>
<EventId>122157660</EventId>
<MessageType>Request</MessageType>
</B2B_METADATA>
<PAYLOAD>
<![CDATA[<?xml version="1.0"?>
<REQUEST_GROUP MISMOVersionID="1.1.1">
<REQUESTING_PARTY _Name="CityBank" _StreetAddress="801 Main St" _City="rockwall" _State="MD" _PostalCode="11311" _Identifier="416">
<CONTACT_DETAIL _Name="XX Davis">
<CONTACT_POINT _Type="Phone" _Value="1236573348"/>
<CONTACT_POINT _Type="Email" _Value="[email protected]"/>
</CONTACT_DETAIL>
</REQUESTING_PARTY>
</REQUEST_GROUP>]]>
</PAYLOAD>
</B2B_DATA>
I am able to do this using loops and iter but would like use xpath for cleaner/shorter code. Also I am using lxml using to parse CDATA, so trying to avoid ElementTree lib.
This is what I tried -
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
for neighbor in root.iter('B2B_METADATA'):
for element in neighbor:
if element.tag == 'EventId':
print(element.text)
requested O/P: EventId 122157660
CodePudding user response:
Actually for very simple queries, built-in etree supports limited XPath:
print(root.findall('.//B2B_METADATA/EventId')[0].text)
Similar to lxml's xpath
:
print(root.xpath('//B2B_METADATA/EventId')[0].text)
Or by parsed objects:
print(root.find('B2B_METADATA').find('EventId').text)
CodePudding user response:
To move your iterators down into XPath, you could use something like this:
result = tree.xpath('/B2B_DATA/B2B_METADATA/EventId/text()')
That would return a string representation of the text node contained in the EventId
element (nested in a B2B_METADATA
element, nested in a B2B_DATA
element) in your XML, i.e. 122157660
. If there were multiple such text nodes in the XML then the xpath
method will return them all as a list of strings.
If you knew that EventId
only ever appears inside /B2B_DATA/B2B_METADATA
then you could shorten your XPath to //EventId/text()
. It would be computationally less efficient, because the //
would search the entire document for EventId
elements, but you may value conciseness over efficiency, especially if the XML document is really small (like your sample)