Home > Enterprise >  XPath expression for getting this grandparent
XPath expression for getting this grandparent

Time:10-26

I'm new to XPath and am working with an XML file which looks like this:

<doc>
    <component>
        <author> Bob </author>
    </component>
    
    <component>
        <sB>
            <component>
                <section ID='S1'>
                    <title>Some s1 title</title>
                </section>
            </component>
            <component>
                <section ID='S2'>
                    <title>Some s2 title</title>
                </section>
            </component>
        </sB>
    </component>
</doc>

I want to retrieve the component item above with section ID = S1, or alternatively the one that has a title element with text 'Some s1 title'. I cannot count on these things being in a particular order.

So far I've tried

import xml.etree.ElementTree as ET

tree = ET.parse('test.xml')
res = tree.getroot().findall(".//*[title='Some s1 title']../../")
for i in res:
    ET.dump(i)

but that gets me both components, not just the one with the matching title.

I've also tried to search at the section ID level, like so:

res = tree.getroot().findall(".//*section[@ID='S1']/../")
for i in res:
    ET.dump(i)

but that doesn't get me the parent (the whole component) and instead just gets me the section.

Both of these seem like they might work from the simple example syntax I've seen online, but clearly in both cases I'm missing some understanding of what is actually happening. Could someone please clarify what is happening here and why I'm not getting what I would expect?

CodePudding user response:

Craft your XPath expression to select component and then use the predicate (the conditions inside the square brackets) to determine which components you want. Such as:

component containing section with ID = 'S1'

//component[./section[@ID='S1']]

or component containing section/title = 'Some s1 title'

//component[./section/title/text() = 'Some s1 title']

or component containing section with ID = 'S1' and that section has title = 'Some s1 title'

//component[./section[@ID='S1']/title/text() = 'Some s1 title']

and other variations thereof are possible.

CodePudding user response:

There are syntax errors with both of your XPaths:

  1. .//*[title='Some s1 title']../../ is missing an / after the predicate. Then this one overshoots upward anyway.

  2. .//*section[@ID='S1']/../ cannot have a * before section. This one would work otherwise.

But rather than repairing and working from there, you don't really need to select along the parent or ancestor axis — better to use a predicate higher in the hierarchy anyway...


This XPath,

//component[section/@ID='S1']

selects the component elements with section children with id attribute value equal to 'S1'.


This XPath,

//component[section/title='Some s1 title']

selects the component elements with section children with title children with a string value equal to 'Some s1 title'.

  • Related