Home > Software design >  Clicking through pagination links that appear in sets with selenium
Clicking through pagination links that appear in sets with selenium


This is my first time with selenium and the website I'm scraping (page) doesn't have a next page button and the pages for pagination don't change till you click the "..." and then it shows the next set of 10 pagination links. How do I loop through the clicking.

I've seen a few answers online but I don't couldn't adapt them to my code because of the links only come in sets. This is the code

from selenium.webdriver import Chrome
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By

driver_path = 'Projects\Selenium Driver\chromedriver_win32'
driver = Chrome(executable_path=driver_path)
drop_down = driver.find_element(By.ID, 'ctl00_cphMain_lstStates')
select = Select(drop_down)
select.select_by_visible_text('NEW HAMPSHIRE')
driver.find_element(By.ID, 'ctl00_cphMain_btnSubmit').click()
content = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender a')
hrefs = []
for link_el in content:
    href = link_el.get_attribute('href')
offenders_href = hrefs[:10]
pagination_links = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender tbody tr td table tbody a')

CodePudding user response:

You can try to execute script e.g. driver.execute_script("javascript:__doPostBack('ctl00$cphMain$gvwOffender','Page$5')") and you will redirected to fifth page

CodePudding user response:

With your current code, the next page elements are already captured within list content[10:]. And the last page hyperlink with ellipsis is actually the next logical sequence. Using this fact, we can use a current page variable to keep track of the page being visited and use that to identify the right anchor tag element within list content for the next page.

With a do-while loop logic and using your code to scrape the required elements, here the primary code:

    offenders_href = list()
    curr_page = 1
    while True:
        # find all anchor tags with this table
        content = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender a')
        hrefs = []
        for link_el in content:
            href = link_el.get_attribute('href')
        offenders_href  = hrefs[:10]
        curr_page  = 1
        # find next page element
        for page_elem in content[10:]:
            if page_elem.get_attribute("href").endswith('$' str(curr_page) "')"):
                next_page = page_elem
            # last page reached, break out of while
        print(f'clicking {next_page.text}...')

I placed this code in function launch_click_pages. Launching it with your URL, it is a able to scroll through pages (it kept going, but I stopped it at some page):

>>> launch_click_pages('https://business.nh.gov/nsor/search.aspx')
clicking 2...
clicking 3...
clicking 4...
clicking 5...
clicking 6...
clicking 7...
clicking 8...
clicking 9...
clicking 10...
clicking ......
clicking 12...
clicking 13...
clicking 14...
clicking 15...
  • Related