Home > database >  SyntaxError: invalid predicate using lxml iterfind
SyntaxError: invalid predicate using lxml iterfind

Time:12-23

I am currently struggling with iterating through a XPath expression. I am trying to retrieve all the system-out nodes that contains a substring of "[[SOMETHING|". The issue is that I get the following syntax error that points to the tree.iterfind.

    for elem in tree.iterfind('.//system-out[contains(.,"[[SOMETHING|")]'):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/lxml/etree.pyx", line 2288, in lxml.etree._ElementTree.iterfind
  File "src/lxml/etree.pyx", line 1588, in lxml.etree._Element.iterfind
  File "src/lxml/_elementpath.py", line 312, in lxml._elementpath.iterfind
  File "src/lxml/_elementpath.py", line 295, in lxml._elementpath._build_path_iterator
  File "src/lxml/_elementpath.py", line 237, in lxml._elementpath.prepare_predicate
SyntaxError: invalid predicate
tree = etree.parse(test_file)
for elem in tree.iterfind('.//system-out[contains(.,"[[SOMETHING|")]'):
     print("do something")

The above is my code. As far as I can see I don't have any syntax error. And I have also tried to test the xpath expression using a free formatter tool, and that seems to work. I just can't seem to see what is wrong. I have attempted to use the "findall" function provided by lxml but I receive the same error. I have also tried to compile the xpath expression using the etree.XPath function into an attribute, however I received an TypeError that says the following, which makes sense.

TypeError: 'lxml.etree.XPath' object is unsliceable

Is there something I am missing? Or is just an unsupported expression by the lxml package itself?

CodePudding user response:

In case SOMETHING instead of [[SOMETHING| still can be used and will be a unique enough I'd suggest instead of this .//system-out[contains(.,"[[SOMETHING|")] to use just this:

'.//system-out[contains(.,"SOMETHING")]'

So the entire code line will be

for elem in tree.iterfind('.//system-out[contains(.,"SOMETHING")]'):

CodePudding user response:

As Martin Honnen explained in the comments, the find methods (iterfind, find, findall) in ElementTree and lxml does not support the full XPath 1.0 syntax which explains the SyntaxError: invalid predicate error.

I used the lxml.etree.xpath() function instead, which does support the XPath 1.0 syntax. Being able to retrieve the text in the XML file I then used the result of the xpath() function to iterate over all of the occurrences by using a much simpler XPath expression that iterfind can understand.

occ = tree.xpath('.//system-out[contains(.,"[[SOMETHING|")]')[0].text
for elem in tree.iterfind(f'.//*[.="{occ}"]'):
     print("do something")

  • Related