Home > Enterprise >  How to create a DataFrame with values from dynamic data-index using Selenium and Python
How to create a DataFrame with values from dynamic data-index using Selenium and Python

Time:03-09

I'm trying to scrape https://www.livescore.com/en/ but I'm facing issue mainly because the structure is different from the others I've already worked on.

I see that there is a dynamic ID that increase the number while scrolling down the page, the id in the code are related only to the visible match on the page, then inside the code the Home team code seems the same compared to the away team code.

This is something I've tried working on

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.maximize_window()
wait=WebDriverWait(driver,30)
driver.get('https://www.livescore.com/en/football/live/')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#onetrust-accept-btn-handler"))).click()


games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
    data1.append({
        'Home':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
        'Away':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
        'Time':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRowTime_time__2Fkd2 MatchRowTime_isLive__2qWag"]').text

The idea is to have a dataframe of the live matches with Home team name, Away team name and actual minute of play

Can someone help me?

CodePudding user response:

AFAIK the clearest and simplest way to locate elements inside elements is to use XPath starting with a dot .
The Home and AWAY team names as well as the match Time fields can be clearly located by the following locators:

games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
    data1.append({
        'Home':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_home")]').text,
        'Away':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_away")]').text,
        'Time':game1.find_element(By.XPATH, './/span[contains(@id,"match-row")]').text

CodePudding user response:

To create a DataFrame using Pandas with the Home Team Name and Away Team Name from the website you need to induce WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.livescore.com/en/')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
    Home_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='home-team-name']")))]
    Away_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='away-team-name']")))]
    df = pd.DataFrame(data=list(zip(Home_team_name, Away_team_name)), columns=['Home Team Name', 'Home Team Name'])
    print(df)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

      Home Team Name       Home Team Name
    0  Bayern Munich          FC Salzburg
    1      Liverpool                Inter
    2       FC Porto                 Lyon
    3     Real Betis  Eintracht Frankfurt
    
  • Related