Home > Enterprise >  Inconsistent web scraping while using selenium, not getting all the scraped data
Inconsistent web scraping while using selenium, not getting all the scraped data

Time:10-25

Im trying to scrape the following page: https://lambda-app-eia.herokuapp.com/ with the code below

from selenium import webdriver 
from selenium.webdriver.common.by import By
import time

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

driver = webdriver.Chrome('chromedriver', chrome_options=chrome_options)

driver.get("https://lambda-app-eia.herokuapp.com/") 

and put the data in a list:

elements = driver.find_elements(By.CSS_SELECTOR, ".MuiTypography-root.MuiTypography-h4.css-2voflx")
job_list = [ job.text for job in elements]

whenever I print the job_list, I get an uncomplete list, and also the "2PSI" should be "2ATM":

['2PSI', '41CELCIUS', '56%', '200PPM']

I´m confused since all the daa seem to have the same CSS code. Any help is appreciated.

CodePudding user response:

Add some wait time before getting the element's property, because the values are getting loaded after 2 to 3 seconds once the page loading completes.

import time

time.sleep(3)
elements = driver.find_elements(By.CSS_SELECTOR, ".MuiTypography-root.MuiTypography-h4.css-2voflx")
job_list = [ job.text for job in elements]

print(job_list)

Output:

['41CELCIUS', '56%', '2ATM', '2000METROS', '200PPM', '300PPM', '200PPM']
  • Related