I am currently struggling with iterating through a XPath expression. I am trying to retrieve all the system-out nodes that contains a substring of "[[SOMETHING|". The issue is that I get the following syntax error that points to the tree.iterfind.
for elem in tree.iterfind('.//system-out[contains(.,"[[SOMETHING|")]'):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src/lxml/etree.pyx", line 2288, in lxml.etree._ElementTree.iterfind
File "src/lxml/etree.pyx", line 1588, in lxml.etree._Element.iterfind
File "src/lxml/_elementpath.py", line 312, in lxml._elementpath.iterfind
File "src/lxml/_elementpath.py", line 295, in lxml._elementpath._build_path_iterator
File "src/lxml/_elementpath.py", line 237, in lxml._elementpath.prepare_predicate
SyntaxError: invalid predicate
tree = etree.parse(test_file)
for elem in tree.iterfind('.//system-out[contains(.,"[[SOMETHING|")]'):
print("do something")
The above is my code. As far as I can see I don't have any syntax error. And I have also tried to test the xpath expression using a free formatter tool, and that seems to work. I just can't seem to see what is wrong. I have attempted to use the "findall" function provided by lxml but I receive the same error. I have also tried to compile the xpath expression using the etree.XPath function into an attribute, however I received an TypeError that says the following, which makes sense.
TypeError: 'lxml.etree.XPath' object is unsliceable
Is there something I am missing? Or is just an unsupported expression by the lxml package itself?
CodePudding user response:
In case SOMETHING
instead of [[SOMETHING|
still can be used and will be a unique enough I'd suggest instead of this
.//system-out[contains(.,"[[SOMETHING|")]
to use just this:
'.//system-out[contains(.,"SOMETHING")]'
So the entire code line will be
for elem in tree.iterfind('.//system-out[contains(.,"SOMETHING")]'):
CodePudding user response:
As Martin Honnen explained in the comments, the find
methods (iterfind
, find
, findall
) in ElementTree and lxml does not support the full XPath 1.0 syntax which explains the SyntaxError: invalid predicate
error.
I used the lxml.etree.xpath()
function instead, which does support the XPath 1.0 syntax. Being able to retrieve the text in the XML file I then used the result of the xpath()
function to iterate over all of the occurrences by using a much simpler XPath expression that iterfind
can understand.
occ = tree.xpath('.//system-out[contains(.,"[[SOMETHING|")]')[0].text
for elem in tree.iterfind(f'.//*[.="{occ}"]'):
print("do something")