I have been trying to scrape a webpage with Python and Selenium and ran into this problem. Basically the webpage that I'm scrapping shows information in a table with pagination, so I want to get the information from all pages. This is the HTML for the pagination system when I'm at a page that's not last page (page 2 in this case):
<span >
" ["
<a href="?page=1">First</a>
"/"
<a href="?page=2">Previous</a>
"] "
<a href="?page=1" title="Go to page 1">1</a>
", "
<strong>2</strong>
", "
<a href="?page=3" title="Go to page 3">3</a>
" ["
<a href="?page=3">Next</a>
"/"
<a href="?page=3">Last</a>
"] "
</span>
And this is the HTML I get when I reach last page (page 3 in this case):
<span >
" ["
<a href="?page=1">First</a>
"/"
<a href="?page=2">Previous</a>
"] "
<a href="?page=1" title="Go to page 1">1</a>
", "
<a href="?page=2" title="Go to page 2">2</a>
", "
<strong>3</strong>
" [Next/Last]"
</span>
In this case page 3 is selected and appears as <strong>
, but this changes deppending on current page.
In order to check if I'm at last page, I want to check if the text "[Next/Last]" is next text after the <strong>
tag to stop the while loop that retrieves the information, but since this text is out of any tag, I found no way to check this, how can I check it?
CodePudding user response:
According to your updated explanations we can look for a
with href
attribute and Next
text content. The same can be done for Last
text.
With Selenium / Python you can simply use this line:
if driver.find_elements(By.XPATH, "//span[@='pagelinks']//a[@href][contains(text(),'Next')]"):
#do what you need to do while still not on the last page
#otherwise you this block will be skipped