Web Scraping from dynamic real time website-CodePudding

I´ve been hardly trying to scrape the following data from this page: https://lambda-app-eia.herokuapp.com/

I need to scrape the numbers selected: in the following image.

Im trying to create a list so that i can treat them as data types , and make some calculations. i´ve been told bs4 does not read dynamic websites, so i switched to selenium instead, making the followinng code:

from selenium import webdriver 
from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC



chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

driver = webdriver.Chrome('chromedriver', chrome_options=chrome_options)

driver.get("https://lambda-app-eia.herokuapp.com/")

Then I try to create a list:

elements = driver.find_elements(By.CSS_SELECTOR, ".MuiTypography-root.MuiTypography-h4.css-2voflx")

job_list = []
for job in elements:
    job_list.append(job.get_attribute('href'))
print(job_list)

And i get as a result a lenght 4 None list.

I suspect it has something to do with the initial searching CSS_SELECTOR, since i took the data as a class from the font code, or probably something to do with the href, which somehow "filters" the data number, but im kind of lost at this point. I have never worked with such libraries so my errors might be pretty fundamental. Of course, ANY help is strongly appreciated.

CodePudding user response：

You have to get the text of the element not the href, like below, also add some wait time:

time.sleep(2)
elements = driver.find_elements(By.CSS_SELECTOR, ".MuiTypography-root.MuiTypography-h4.css-2voflx")

job_list = []
for job in elements:
    job_list.append(job.text)
print(job_list)