Home > Net >  How to filter child element by putting condition on another child element in XML
How to filter child element by putting condition on another child element in XML


In below XML, I need to extract the BinaryImage if the ImageType is fullimage.

<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
   <soapenv:Header />
      <Instation xmlns="http://ffsf.us.com/schema_1-2" SchemaVersion="1.2">

I tried with findall and xpath but it gave the following errors:






lxml.etree.XPathEvalError: Invalid expression

SyntaxError: invalid predicate

The documentation does not seem to be very helpful, what am I doing wrong?

CodePudding user response:

The below should work

import xml.etree.ElementTree as ET

xml = '''<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
   <soapenv:Header />
      <Instation xmlns="http://ffsf.us.com/schema_1-2" SchemaVersion="1.2">

root = ET.fromstring(xml)
binary_images = [im.find('{http://ffsf.us.com/schema_1-2}BinaryImage').text for im in root.findall('.//{http://ffsf.us.com/schema_1-2}Image') if im.find('{http://ffsf.us.com/schema_1-2}ImageType').text == 'fullImage']



CodePudding user response:

Since you need to parse against a default namespace, consider using the namespaces argument available in both findall and xpath where you can map the URI to a user-defined prefix (e.g., doc) using a dictionary to be used on all elements in XPath expression.

Additionally, your XPath must be adjusted without @ since no attributes are included.

import lxml.etree as lx

doc = lx.parse("Input.xml")

nmsp = {"doc": "http://ffsf.us.com/schema_1-2"}
xpr = ".//doc:Image[doc:ImageType='fullImage']/doc:BinaryImage"

images_findall = [d.text for d in doc.findall(xpr, namespaces=nmsp)]

images_xpath = [d.text for d in doc.xpath(xpr, namespaces=nmsp)]

Do note: findall only supports very simple XPath such as above and not the fuller XPath 1.0 specification like xpath. For example, you could have also used preceding-sibling axis:

xpr = ".//doc:Image/doc:BinaryImage[preceding-sibling::doc:ImageType='fullImage']"

images_xpath = [d.text for d in doc.xpath(xpr, namespaces=nmsp)]
  • Related