Home > other >  How to scrape multiple pages from search results all at once
How to scrape multiple pages from search results all at once

Time:10-21

I am trying to scrape multiple pages from search results and print it all at once, but got an empty list instead.

Here is the code I used:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

element_list = []

for skip in range(0, 20, 10):
    
    page_url = "https://jdih.esdm.go.id/index.php/web/result?tahun_terbit=2022,2021,2020,2019,2018,2017,2016,2015,2014&skip="   str(skip)
    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.get(page_url)
    
    Tahun = driver.find_elements(By.CSS_SELECTOR, 'div.numb separator')
    No_Peraturan = driver.find_elements(By.CSS_SELECTOR, 'span.result-value')
    Nama_Peraturan = driver.find_elements(By.CSS_SELECTOR, 'div.result__content__item__title')
    Deskripsi = driver.find_elements(By.CSS_SELECTOR, 'div.result__content__item__desc')

    
    for i in range(len(Tahun)):
        element_list.append([Tahun[i].text, No_Peraturan[i].text, Nama_Peraturan[i].text, Deskripsi[i].text])

print(element_list)

driver.close()

The code return only return an empty list like in this picture enter image description here

Note: the website does not use 'page' as generally use for search results, but uses 'skip' instead

Anyone can help me with this ?

CodePudding user response:

The CSS selector to find Tahun elements is incorrect as there are 2 classes assigned to the div. This results in Tahun being an empty list and since the loop to append text to element_list is based on the length of Tahun, nothing gets appended.

Update the selector to below.

Tahun = driver.find_elements(By.CSS_SELECTOR, 'div.numb.separator')
  • Related