I´ve been hardly trying to scrape the following data from this page: https://lambda-app-eia.herokuapp.com/
I need to scrape the numbers selected: in the following image.
Im trying to create a list so that i can treat them as data types , and make some calculations. i´ve been told bs4 does not read dynamic websites, so i switched to selenium instead, making the followinng code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver', chrome_options=chrome_options)
driver.get("https://lambda-app-eia.herokuapp.com/")
Then I try to create a list:
elements = driver.find_elements(By.CSS_SELECTOR, ".MuiTypography-root.MuiTypography-h4.css-2voflx")
job_list = []
for job in elements:
job_list.append(job.get_attribute('href'))
print(job_list)
And i get as a result a lenght 4 None list.
I suspect it has something to do with the initial searching CSS_SELECTOR, since i took the data as a class from the font code, or probably something to do with the href, which somehow "filters" the data number, but im kind of lost at this point. I have never worked with such libraries so my errors might be pretty fundamental. Of course, ANY help is strongly appreciated.
CodePudding user response:
You have to get the text of the element not the href, like below, also add some wait time:
time.sleep(2)
elements = driver.find_elements(By.CSS_SELECTOR, ".MuiTypography-root.MuiTypography-h4.css-2voflx")
job_list = []
for job in elements:
job_list.append(job.text)
print(job_list)