Clicking through pagination links that appear in sets with selenium-CodePudding

This is my first time with selenium and the website I'm scraping (page) doesn't have a next page button and the pages for pagination don't change till you click the "..." and then it shows the next set of 10 pagination links. How do I loop through the clicking.

I've seen a few answers online but I don't couldn't adapt them to my code because of the links only come in sets. This is the code

from selenium.webdriver import Chrome
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By

driver_path = 'Projects\Selenium Driver\chromedriver_win32'
driver = Chrome(executable_path=driver_path)
driver.get('https://business.nh.gov/nsor/search.aspx')
drop_down = driver.find_element(By.ID, 'ctl00_cphMain_lstStates')
select = Select(drop_down)
select.select_by_visible_text('NEW HAMPSHIRE')
driver.find_element(By.ID, 'ctl00_cphMain_btnSubmit').click()
content = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender a')
hrefs = []
for link_el in content:
    href = link_el.get_attribute('href')
    hrefs.append(href)
offenders_href = hrefs[:10]
pagination_links = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender tbody tr td table tbody a')

CodePudding user response：

You can try to execute script e.g. driver.execute_script("javascript:__doPostBack('ctl00$cphMain$gvwOffender','Page$5')") and you will redirected to fifth page

CodePudding user response：

With your current code, the next page elements are already captured within list content[10:]. And the last page hyperlink with ellipsis is actually the next logical sequence. Using this fact, we can use a current page variable to keep track of the page being visited and use that to identify the right anchor tag element within list content for the next page.

With a do-while loop logic and using your code to scrape the required elements, here the primary code:

    offenders_href = list()
    curr_page = 1
    while True:
        # find all anchor tags with this table
        content = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender a')
        hrefs = []
        for link_el in content:
            href = link_el.get_attribute('href')
            hrefs.append(href)
        offenders_href  = hrefs[:10]
        curr_page  = 1
        # find next page element
        for page_elem in content[10:]:
            if page_elem.get_attribute("href").endswith('$' str(curr_page) "')"):
                next_page = page_elem
                break
        else:
            # last page reached, break out of while
            break
        print(f'clicking {next_page.text}...')
        next_page.click()
        sleep(1)

I placed this code in function launch_click_pages. Launching it with your URL, it is a able to scroll through pages (it kept going, but I stopped it at some page):

>>> launch_click_pages('https://business.nh.gov/nsor/search.aspx')
clicking 2...
clicking 3...
clicking 4...
clicking 5...
clicking 6...
clicking 7...
clicking 8...
clicking 9...
clicking 10...
clicking ......
clicking 12...
clicking 13...
clicking 14...
clicking 15...
^C