This is my first time with selenium and the website I'm scraping (page) doesn't have a next page button and the pages for pagination don't change till you click the "..." and then it shows the next set of 10 pagination links. How do I loop through the clicking.
I've seen a few answers online but I don't couldn't adapt them to my code because of the links only come in sets. This is the code
from selenium.webdriver import Chrome
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
driver_path = 'Projects\Selenium Driver\chromedriver_win32'
driver = Chrome(executable_path=driver_path)
driver.get('https://business.nh.gov/nsor/search.aspx')
drop_down = driver.find_element(By.ID, 'ctl00_cphMain_lstStates')
select = Select(drop_down)
select.select_by_visible_text('NEW HAMPSHIRE')
driver.find_element(By.ID, 'ctl00_cphMain_btnSubmit').click()
content = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender a')
hrefs = []
for link_el in content:
href = link_el.get_attribute('href')
hrefs.append(href)
offenders_href = hrefs[:10]
pagination_links = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender tbody tr td table tbody a')
CodePudding user response:
You can try to execute script e.g. driver.execute_script("javascript:__doPostBack('ctl00$cphMain$gvwOffender','Page$5')")
and you will redirected to fifth page
CodePudding user response:
With your current code, the next page elements are already captured within list content[10:]
. And the last page hyperlink with ellipsis is actually the next logical sequence. Using this fact, we can use a current page variable to keep track of the page being visited and use that to identify the right anchor tag element within list content
for the next page.
With a do-while loop logic and using your code to scrape the required elements, here the primary code:
offenders_href = list()
curr_page = 1
while True:
# find all anchor tags with this table
content = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender a')
hrefs = []
for link_el in content:
href = link_el.get_attribute('href')
hrefs.append(href)
offenders_href = hrefs[:10]
curr_page = 1
# find next page element
for page_elem in content[10:]:
if page_elem.get_attribute("href").endswith('$' str(curr_page) "')"):
next_page = page_elem
break
else:
# last page reached, break out of while
break
print(f'clicking {next_page.text}...')
next_page.click()
sleep(1)
I placed this code in function launch_click_pages
. Launching it with your URL, it is a able to scroll through pages (it kept going, but I stopped it at some page):
>>> launch_click_pages('https://business.nh.gov/nsor/search.aspx')
clicking 2...
clicking 3...
clicking 4...
clicking 5...
clicking 6...
clicking 7...
clicking 8...
clicking 9...
clicking 10...
clicking ......
clicking 12...
clicking 13...
clicking 14...
clicking 15...
^C