I'm trying to do something I thought should be very simple in ElementTree: find elements with specific tag content. The docs give the example:
*[tag='text']* Selects all elements that have a child named *tag* whose complete text content, including descendants, equals the given *text*.
Which seems straightforward enough. However, it does not work as I expect. Suppose I want to find all examples of <note>NEW</note>
. The following complete example:
#!/usr/bin/env python
import xml.etree.ElementTree as ET
xml = """<?xml version="1.0"?>
<entry>
<foo>blah</foo>
<foo>bblic</foo>
<foo>fjdks<note>NEW</note></foo>
<foo>fdfsd</foo>
<foo>ljklj<note>NEW</note></foo>
</entry>
"""
root = ET.fromstring(xml)
print("Number of 'foo' elements: %d" % len(root.findall('.//foo')))
print("Number of new 'foo' elements: %d" % len(root.findall('.//[note="NEW"]')))
Yields:
$ python foo.py
Number of 'foo' elements: 5
Traceback (most recent call last):
File "/usr/lib/python3.10/xml/etree/ElementPath.py", line 370, in iterfind
selector = _cache[cache_key]
KeyError: ('.//[note="NEW"]',)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/foo.py", line 17, in <module>
print("Number of new 'foo' elements: %d" % len(root.findall('.//[note="NEW"]')))
File "/usr/lib/python3.10/xml/etree/ElementPath.py", line 411, in findall
return list(iterfind(elem, path, namespaces))
File "/usr/lib/python3.10/xml/etree/ElementPath.py", line 384, in iterfind
selector.append(ops[token[0]](next, token))
File "/usr/lib/python3.10/xml/etree/ElementPath.py", line 193, in prepare_descendant
raise SyntaxError("invalid descendant")
SyntaxError: invalid descendant
How am I meant to do this simple task?
CodePudding user response:
docs
says also that
Predicates (expressions within square brackets) must be preceded by a tag name, an asterisk, or another predicate.
taking this is account
root.findall('.//[note="NEW"]')
is illegal, you should add *
before [
to denote any tag i.e.
root.findall('.//*[note="NEW"]')
xor use tag name before [
to denote certain tag i.e.
root.findall('.//foo[note="NEW"]')
CodePudding user response:
The main problem seems an expected dependency from first to second search, which does not exist.
This works (but used syntax requires Python >=3.10):
for foo in root.findall('.//foo[note="NEW"]'):
print(foo.text)