Home > OS >  How to stop the selenium webdriver after reaching the last page while scraping the website?
How to stop the selenium webdriver after reaching the last page while scraping the website?

Time:03-15

The amount of data(number of pages) on the site keeps changing and I need to scrape all the pages looping through the pagination. Website: https://monentreprise.bj/page/annonces

Code I tried:

xpath= "//*[@id='yw3']/li[12]/a"        
while True:
    next_page = driver.find_elements(By.XPATH,xpath)
    if len(next_page) < 1:
        print("No more pages")
        break
    else:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, xpath))).click()
        print('ok')

ok is printed continuously

CodePudding user response:

Because the condition if len(next_page)<1 is always False.

For instance I tried the url enter image description here

CodePudding user response:

There are several issues here:

  1. //*[@id='yw3']/li[12]/a is not a correct locator for the next pagination button.
  2. The better indication for the last page reached state here will be to validate if this css_locator based element .pagination .next contains disabled class.
  3. You have to scroll the page down before clicking the next page button
  4. You have to add a delay after clicking on the pagination button. Otherwise this will not work.
    This code worked for me:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

driver = webdriver.Chrome()
my_url = "https://monentreprise.bj/page/annonces"
driver.get(my_url)
next_page_parent = '.pagination .next'
next_page_parent_arrow = '.pagination .next a'
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
    time.sleep(0.5)
    parent = driver.find_element(By.CSS_SELECTOR,next_page_parent)
    class_name = parent.get_attribute("class")
    if "disabled" in class_name:
        print("No more pages")
        break
    else:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, next_page_parent_arrow))).click()
        time.sleep(1.5)
        print('ok')

The output is:

ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
No more pages
  • Related