If you have two Xpaths you can join them with the |
operator to return both their results in one result set. This essentially gives back the union of the two sets of elements. The example below gives back all div
s and all span
s on a website:
//div | //span
What I need is the difference (subsection). I need all elements in the first Xpath group that are not in the second Xpath group. So far I have seen that there is an except
operator but that only works in Xpath2. I need an Xpath1 solution. I have seen that the not
function might help but I was not able to make it work.
As an example imagine the following:
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
</tr>
In this example I would have the Xpath group //tr/td
. I would want to exclude <td>1</td>
and <td>4</td>
. Although there are many ways to solve the problem I am specifically looking for a solution where I can say in an Xpath: "Here is a group of elements and exclude this group of elements from it".
CodePudding user response:
An approach realizing this is using the self::
axis and the not()
operator in a predicate:
For example, with an XML like this
<root>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
</tr>
<dr>
<td>1</td>
<td>4</td>
</dr>
</root>
you can use this XPath-1.0 expression:
//tr/td[not(self::*=//dr/td)]
which can be shortened to
//tr/td[not(.=//dr/td)]
The resulting nodeset is as desired
<td>2</td>
<td>3</td>
<td>5</td>
The XPath expression selects all elements of the first part and checks in the predicate if every element itself (self::*
or .
) is in the second part. If it is, it will be excluded (not(...)
).
You can also apply this approach to attribute nodes. In this case you have to use the .
, because self::*
is more specific and only selects elements. So you could replace self::*
by .
, but not the other way round. (The most general axis would be self::node()
.)
CodePudding user response:
You can use logic and and not operators here.
For your specific example you can use the following XPath
"//tr/td[not(text()=`1`)][not(text()=`4`)]"
CodePudding user response:
In XPath 2.0 there is an operator for this: except
. If E and F are general expressions returning sets of nodes, then E except F
returns all nodes selected by E that are not selected by F.
There's no convenient way of doing the same thing in XPath 1.0, but the rather cumbersome (and potentially expensive) expression E[count(.|F) != count(F)]
is equivalent (though you need to take care about the context for evaluation of F).
In many practical cases you can achieve the desired effect with a filter predicate, for example //td[not(ancestor::tr)]
.