Home > Software design >  Python Selenium get text out of tags
Python Selenium get text out of tags

Time:11-10

I have been trying to scrape a webpage with Python and Selenium and ran into this problem. Basically the webpage that I'm scrapping shows information in a table with pagination, so I want to get the information from all pages. This is the HTML for the pagination system when I'm at a page that's not last page (page 2 in this case):

<span >
   " ["
   <a href="?page=1">First</a>
   "/"
   <a href="?page=2">Previous</a>
   "] "
   <a href="?page=1" title="Go to page 1">1</a>
   ", "
   <strong>2</strong>
   ", "
   <a href="?page=3" title="Go to page 3">3</a>
   " ["
   <a href="?page=3">Next</a>
   "/"
   <a href="?page=3">Last</a>
   "] "
</span>

And this is the HTML I get when I reach last page (page 3 in this case):

<span >
   " ["
   <a href="?page=1">First</a>
   "/"
   <a href="?page=2">Previous</a>
   "] "
   <a href="?page=1" title="Go to page 1">1</a>
   ", "
   <a href="?page=2" title="Go to page 2">2</a>
   ", "
   <strong>3</strong>
   " [Next/Last]"
</span>

In this case page 3 is selected and appears as <strong>, but this changes deppending on current page.

In order to check if I'm at last page, I want to check if the text "[Next/Last]" is next text after the <strong>tag to stop the while loop that retrieves the information, but since this text is out of any tag, I found no way to check this, how can I check it?

CodePudding user response:

According to your updated explanations we can look for a with href attribute and Next text content. The same can be done for Last text.
With Selenium / Python you can simply use this line:

if driver.find_elements(By.XPATH, "//span[@='pagelinks']//a[@href][contains(text(),'Next')]"):
    #do what you need to do while still not on the last page
    #otherwise you this block will be skipped 
  • Related