The website I want to scrape is paginated but I can't just iterate over pages since every next page has some extra random number in it.
Here is the page :
https://market.bisnis.com/bursa-saham/2/20220621181040 (second page) https://market.bisnis.com/bursa-saham/(page)/20220621181040
If i just change the (page) it will result blank page, here is my code btw, thanks!
options = Options()
options.add_argument("start-maximized")
options.add_argument('--no-sandbox')
element_list = []
for page in range(1,3, 1):
page_url = "https://market.bisnis.com/bursa-saham/" str(page)
driver = webdriver.Chrome("C:/Users/krish/Desktop/chromedriver_win32/chromedriver.exe", chrome_options=options,)
driver.get(page_url)
title = driver.find_elements(By.TAG_NAME, 'h2')
for i in range(len(title)):
element_list.append([title[i].text])
with xlsxwriter.Workbook('result2.xlsx') as workbook:
worksheet = workbook.add_worksheet()
for row_num, data in enumerate(element_list):
worksheet.write_row(row_num, 0, data)
driver.close()
CodePudding user response:
Instead of navigating to next page by URL (URL containing date and time which I believe you don't know in advance) try to click Next button:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
next_button = driver.find_element(By.ID, 'nextbtn')
next_button.click()
WebDriverWait(driver, 10).until(EC.staleness_of(next_button))
P.S. Also you'd better move
driver = webdriver.Chrome("C:/Users/krish/Desktop/chromedriver_win32/chromedriver.exe", chrome_options=options,)
line out from loop to use the same browser instance for scraping all the pages