How to stop the selenium webdriver after reaching the last page while scraping the website?-CodePudding

The amount of data(number of pages) on the site keeps changing and I need to scrape all the pages looping through the pagination. Website: https://monentreprise.bj/page/annonces

Code I tried:

xpath= "//*[@id='yw3']/li[12]/a"        
while True:
    next_page = driver.find_elements(By.XPATH,xpath)
    if len(next_page) < 1:
        print("No more pages")
        break
    else:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, xpath))).click()
        print('ok')

ok is printed continuously

CodePudding user response：

Because the condition if len(next_page)<1 is always False.

For instance I tried the url

CodePudding user response：

There are several issues here:

//*[@id='yw3']/li[12]/a is not a correct locator for the next pagination button.
The better indication for the last page reached state here will be to validate if this css_locator based element .pagination .next contains disabled class.
You have to scroll the page down before clicking the next page button
You have to add a delay after clicking on the pagination button. Otherwise this will not work.
This code worked for me:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

driver = webdriver.Chrome()
my_url = "https://monentreprise.bj/page/annonces"
driver.get(my_url)
next_page_parent = '.pagination .next'
next_page_parent_arrow = '.pagination .next a'
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
    time.sleep(0.5)
    parent = driver.find_element(By.CSS_SELECTOR,next_page_parent)
    class_name = parent.get_attribute("class")
    if "disabled" in class_name:
        print("No more pages")
        break
    else:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, next_page_parent_arrow))).click()
        time.sleep(1.5)
        print('ok')

The output is:

ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
No more pages