I'm new to XPath and am working with an XML file which looks like this:
<doc>
<component>
<author> Bob </author>
</component>
<component>
<sB>
<component>
<section ID='S1'>
<title>Some s1 title</title>
</section>
</component>
<component>
<section ID='S2'>
<title>Some s2 title</title>
</section>
</component>
</sB>
</component>
</doc>
I want to retrieve the component item above with section ID = S1, or alternatively the one that has a title element with text 'Some s1 title'. I cannot count on these things being in a particular order.
So far I've tried
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
res = tree.getroot().findall(".//*[title='Some s1 title']../../")
for i in res:
ET.dump(i)
but that gets me both components, not just the one with the matching title.
I've also tried to search at the section ID level, like so:
res = tree.getroot().findall(".//*section[@ID='S1']/../")
for i in res:
ET.dump(i)
but that doesn't get me the parent (the whole component) and instead just gets me the section.
Both of these seem like they might work from the simple example syntax I've seen online, but clearly in both cases I'm missing some understanding of what is actually happening. Could someone please clarify what is happening here and why I'm not getting what I would expect?
CodePudding user response:
Craft your XPath expression to select component
and then use the predicate (the conditions inside the square brackets) to determine which components
you want. Such as:
component
containing section
with ID
= 'S1'
//component[./section[@ID='S1']]
or component
containing section/title
= 'Some s1 title'
//component[./section/title/text() = 'Some s1 title']
or component containing section
with ID = 'S1' and that section
has title
= 'Some s1 title'
//component[./section[@ID='S1']/title/text() = 'Some s1 title']
and other variations thereof are possible.
CodePudding user response:
There are syntax errors with both of your XPaths:
.//*[title='Some s1 title']../../
is missing an/
after the predicate. Then this one overshoots upward anyway..//*section[@ID='S1']/../
cannot have a*
beforesection
. This one would work otherwise.
But rather than repairing and working from there, you don't really need to select along the parent or ancestor axis — better to use a predicate higher in the hierarchy anyway...
This XPath,
//component[section/@ID='S1']
selects the component
elements with section
children with id
attribute value equal to 'S1'
.
This XPath,
//component[section/title='Some s1 title']
selects the component
elements with section
children with title
children with a string value equal to 'Some s1 title'
.