I have a HTML like this :
<ol >
<li id="37647629">
<!---->
<div>
<!---->
<div>
<!---->
<book >
<div >
someText
</div>
<div >
2022
</div>
</book>
</div>
<!---->
</div>
<!---->
</li>
<li id="37647778">
<!---->
<div>
<!---->
<div>
<!---->
<book >
<div >
someOtherText
</div>
<div >
2014
</div>
</book>
</div>
</div>
<!---->
</li>
</ol>
I want to get the first book title and year, directly with two xPath expression. I tried :
$x('//book') => Ok, get the two books list
$x('//book[0]') => Empty list
$x('//book[0]/div[@]') => Nothing
Seems I have to do this :
$x('//book')[0]
and then process title, but why I can't do this just with Xpath and directly access the first title with a Xpath expression ?
CodePudding user response:
This will give you the first book title
"(//book)[1]//div[@class='title']"
And this gives the first book year
"(//book)[1]//div[@class='year']"
CodePudding user response:
You're missing that XPath indexing starts at 1
; JavaScript indexing starts at 0
.
$x('//book')
selects allbook
elements in the document.$x('//book[0]')
selects nothing because XPath indexing starts at1
. (It also signifies to select allbook
elements that are the first among siblings — not necessarily the same as the first of allbook
elements in the document.)$x('//book')[0]
would select the firstbook
element because JavaScript indexing starts at0
.$x('(//book)[1]')
would select the firstbook
element because XPath indexing starts at1
.
To select the first div
with class
of 'title'
, all in XPath:
$x('(//div[@])[1]')
or, using JavaScript to index:
$x('(//div[@])')[0]
To return just the string value without the leading/trailing whitespace, wrap in normalize-space()
:
$x('normalize-space((//div[@])[1])')
Note that normalize-space()
will also consolidate internal whitespace, but that is of no consequence with this example.
See also
- How to select first element via XPath? (And be sure not to miss the explanation of the difference between
//book[1]
and(//book)[1]
— they are not the same.)