Home > database >  Selenium - Iterate over paginated site with extra random number
Selenium - Iterate over paginated site with extra random number

Time:06-22

The website I want to scrape is paginated but I can't just iterate over pages since every next page has some extra random number in it.

Here is the page :

https://market.bisnis.com/bursa-saham/2/20220621181040 (second page) https://market.bisnis.com/bursa-saham/(page)/20220621181040

If i just change the (page) it will result blank page, here is my code btw, thanks!

options = Options()
options.add_argument("start-maximized")
options.add_argument('--no-sandbox')
  
element_list = []
  
for page in range(1,3, 1):
    
    page_url = "https://market.bisnis.com/bursa-saham/"   str(page)
    driver = webdriver.Chrome("C:/Users/krish/Desktop/chromedriver_win32/chromedriver.exe", chrome_options=options,)
    driver.get(page_url)
    title = driver.find_elements(By.TAG_NAME, 'h2')
  
    for i in range(len(title)):
        element_list.append([title[i].text])
  
with xlsxwriter.Workbook('result2.xlsx') as workbook:
    worksheet = workbook.add_worksheet()
  
    for row_num, data in enumerate(element_list):
        worksheet.write_row(row_num, 0, data)
  
driver.close()

CodePudding user response:

Instead of navigating to next page by URL (URL containing date and time which I believe you don't know in advance) try to click Next button:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

next_button = driver.find_element(By.ID, 'nextbtn')
next_button.click()
WebDriverWait(driver, 10).until(EC.staleness_of(next_button))

P.S. Also you'd better move

driver = webdriver.Chrome("C:/Users/krish/Desktop/chromedriver_win32/chromedriver.exe", chrome_options=options,)

line out from loop to use the same browser instance for scraping all the pages

  • Related