I am learning to use scrapy and am building a simple crawler to reinforce what I am learning, and am attempting to get the next page link but am having trouble. Can anyone point me in the right direction of getting the next page link, which is located in the a
of the final li
The pagination div is as follows:
<div >
<ul>
<li><a href="./viewforum.php?f=399&start=40" data-original-title="" title=""><i
></i></a></li>
<li><a href="./viewforum.php?f=399" data-original-title="" title="">1</a></li>
<span >, </span>
<li><a href="./viewforum.php?f=399&start=40" data-original-title="" title="">2</a></li>
<span >, </span>
<li ><a data-original-title="" title="">3</a></li>
<span >, </span>
<li><a href="./viewforum.php?f=399&start=120" data-original-title="" title="">4</a></li>
<span >, </span>
<li><a href="./viewforum.php?f=399&start=160" data-original-title="" title="">5</a></li>
<span >, </span>
<li><a href="./viewforum.php?f=399&start=200" data-original-title="" title="">6</a></li>
<li ><a href="#" onclick="jumpto(); return false;" title=""
data-original-title="Jump to page"> ... </a></li>
<li><a href="./viewforum.php?f=399&start=311244" data-original-title="" title="">10012</a></li>
<li><a href="./viewforum.php?f=399&start=120" data-original-title="" title=""><i
></i></a></li>
</ul>
</div>
I have tried different variations of the following, but get the wrong li
returned, it still gives me the class=active li
even though I used li:not([])
:
response.css('div.pagination.pagination-small.hidden-phone').css('li:not([])').get()
example:
>>> response.css('div.pagination.pagination-small.hidden-phone').css('li:not([])').get()
'<li ><a>1</a></li>'
Thanks
CodePudding user response:
Since it's the last li
on the list we can use this to out advantage.
css:
In [1]: response.css('div.pagination li:last-child a::attr(href)').get()
Out[1]: './viewforum.php?f=399&start=120'
xpath:
In [2]: response.xpath('//div[contains(@class, "pagination")]//li[last()]/a/@href').get()
Out[2]: './viewforum.php?f=399&start=120'