Get first element Xpath-CodePudding

I have a HTML like this :

<ol >
   <li  id="37647629">
      <!---->
      <div>
         <!---->
         <div>
            <!---->
            <book >
              <div >
                 someText
              </div>    
              <div >
                 2022
              </div>               
            </book>
         </div>
         <!---->         
      </div>
      <!---->
   </li>
   <li  id="37647778">
      <!---->
      <div>
         <!---->
         <div>
            <!---->
            <book >
              <div >
                 someOtherText
              </div>    
              <div >
                 2014
              </div>            
            </book>
         </div>
      </div>
      <!---->
   </li>   
</ol>

I want to get the first book title and year, directly with two xPath expression. I tried :

$x('//book') => Ok, get the two books list

$x('//book[0]') => Empty list    

$x('//book[0]/div[@]') => Nothing

Seems I have to do this :

$x('//book')[0]

and then process title, but why I can't do this just with Xpath and directly access the first title with a Xpath expression ?

CodePudding user response：

This will give you the first book title

"(//book)[1]//div[@class='title']"

And this gives the first book year

"(//book)[1]//div[@class='year']"

CodePudding user response：

You're missing that XPath indexing starts at 1; JavaScript indexing starts at 0.

$x('//book') selects all book elements in the document.
$x('//book[0]') selects nothing because XPath indexing starts at 1. (It also signifies to select all book elements that are the first among siblings — not necessarily the same as the first of all book elements in the document.)
- $x('//book')[0] would select the first book element because JavaScript indexing starts at 0.
- $x('(//book)[1]') would select the first book element because XPath indexing starts at 1.

To select the first div with class of 'title', all in XPath:

$x('(//div[@])[1]')

or, using JavaScript to index:

$x('(//div[@])')[0]

To return just the string value without the leading/trailing whitespace, wrap in normalize-space():

$x('normalize-space((//div[@])[1])')

Note that normalize-space() will also consolidate internal whitespace, but that is of no consequence with this example.

See also