I have been trying to scrape a webpage with Python and Selenium and ran into this problem. Basically, the webpage that I'm scraping shows information in a table with pagination, so I want to get the information from all pages. This is the HTML for the pagination system when I'm at a page that's not the last page (page 2 in this case):
<span >
" ["
<a href="?page=1">First</a>
"/"
<a href="?page=2">Previous</a>
"] "
<a href="?page=1" title="Go to page 1">1</a>
", "
<strong>2</strong>
", "
<a href="?page=3" title="Go to page 3">3</a>
" ["
<a href="?page=3">Next</a>
"/"
<a href="?page=3">Last</a>
"] "
</span>
And this is the HTML I get when I reach the last page (page 3 in this case):
<span >
" ["
<a href="?page=1">First</a>
"/"
<a href="?page=2">Previous</a>
"] "
<a href="?page=1" title="Go to page 1">1</a>
", "
<a href="?page=2" title="Go to page 2">2</a>
", "
<strong>3</strong>
" [Next/Last]"
</span>
In this case, page 3 is selected and appears as <strong>
, but this changes depending on the current page.
In order to check if I'm at the last page, I want to check if the text "[Next/Last]" is the next text after the <strong>
tag to stop the while loop that retrieves the information, but since this text is out of any tag, I didn’t find any way to check this. How can I check it?
CodePudding user response:
We can look for a
with an href
attribute and Next
text content. The same can be done for the Last
text.
With Selenium / Python you can simply use this line:
if driver.find_elements(By.XPATH, "//span[@='pagelinks']//a[@href][contains(text(),'Next')]"):
# Do what you need to do while still not on the last
# page. Otherwise, this block will be skipped.