Exclude Xpath from other Xpath-CodePudding

If you have two Xpaths you can join them with the | operator to return both their results in one result set. This essentially gives back the union of the two sets of elements. The example below gives back all divs and all spans on a website:

//div | //span

What I need is the difference (subsection). I need all elements in the first Xpath group that are not in the second Xpath group. So far I have seen that there is an except operator but that only works in Xpath2. I need an Xpath1 solution. I have seen that the not function might help but I was not able to make it work.

As an example imagine the following:

<tr>
    <td>1</td>
    <td>2</td>
    <td>3</td>
    <td>4</td>
    <td>5</td>
</tr>

In this example I would have the Xpath group //tr/td. I would want to exclude <td>1</td> and <td>4</td>. Although there are many ways to solve the problem I am specifically looking for a solution where I can say in an Xpath: "Here is a group of elements and exclude this group of elements from it".

CodePudding user response：

An approach realizing this is using the self:: axis and the not() operator in a predicate:
For example, with an XML like this

<root>
    <tr>
        <td>1</td>
        <td>2</td>
        <td>3</td>
        <td>4</td>
        <td>5</td>
    </tr>    
    <dr>
        <td>1</td>
        <td>4</td>
    </dr>    
</root>

you can use this XPath-1.0 expression:

//tr/td[not(self::*=//dr/td)]

which can be shortened to

//tr/td[not(.=//dr/td)]

The resulting nodeset is as desired

<td>2</td>
<td>3</td>
<td>5</td>

The XPath expression selects all elements of the first part and checks in the predicate if every element itself (self::* or .) is in the second part. If it is, it will be excluded (not(...)).

You can also apply this approach to attribute nodes. In this case you have to use the ., because self::* is more specific and only selects elements. So you could replace self::* by ., but not the other way round. (The most general axis would be self::node().)

CodePudding user response：

You can use logic and and not operators here.
For your specific example you can use the following XPath

"//tr/td[not(text()=`1`)][not(text()=`4`)]"

CodePudding user response：

In XPath 2.0 there is an operator for this: except. If E and F are general expressions returning sets of nodes, then E except F returns all nodes selected by E that are not selected by F.

There's no convenient way of doing the same thing in XPath 1.0, but the rather cumbersome (and potentially expensive) expression E[count(.|F) != count(F)] is equivalent (though you need to take care about the context for evaluation of F).

In many practical cases you can achieve the desired effect with a filter predicate, for example //td[not(ancestor::tr)].