Im trying to scrape the following page: https://lambda-app-eia.herokuapp.com/ with the code below
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver', chrome_options=chrome_options)
driver.get("https://lambda-app-eia.herokuapp.com/")
and put the data in a list:
elements = driver.find_elements(By.CSS_SELECTOR, ".MuiTypography-root.MuiTypography-h4.css-2voflx")
job_list = [ job.text for job in elements]
whenever I print the job_list, I get an uncomplete list, and also the "2PSI" should be "2ATM":
['2PSI', '41CELCIUS', '56%', '200PPM']
I´m confused since all the daa seem to have the same CSS code. Any help is appreciated.
CodePudding user response:
Add some wait time before getting the element's property, because the values are getting loaded after 2 to 3 seconds once the page loading completes.
import time
time.sleep(3)
elements = driver.find_elements(By.CSS_SELECTOR, ".MuiTypography-root.MuiTypography-h4.css-2voflx")
job_list = [ job.text for job in elements]
print(job_list)
Output:
['41CELCIUS', '56%', '2ATM', '2000METROS', '200PPM', '300PPM', '200PPM']