I'm trying to scrape https://www.livescore.com/en/ but I'm facing issue mainly because the structure is different from the others I've already worked on.
I see that there is a dynamic ID that increase the number while scrolling down the page, the id in the code are related only to the visible match on the page, then inside the code the Home team code seems the same compared to the away team code.
This is something I've tried working on
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.maximize_window()
wait=WebDriverWait(driver,30)
driver.get('https://www.livescore.com/en/football/live/')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#onetrust-accept-btn-handler"))).click()
games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
data1.append({
'Home':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
'Away':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
'Time':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRowTime_time__2Fkd2 MatchRowTime_isLive__2qWag"]').text
The idea is to have a dataframe of the live matches with Home team name, Away team name and actual minute of play
Can someone help me?
CodePudding user response:
AFAIK the clearest and simplest way to locate elements inside elements is to use XPath starting with a dot .
The Home
and AWAY
team names as well as the match Time
fields can be clearly located by the following locators:
games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
data1.append({
'Home':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_home")]').text,
'Away':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_away")]').text,
'Time':game1.find_element(By.XPATH, './/span[contains(@id,"match-row")]').text
CodePudding user response:
To create a DataFrame using Pandas with the Home Team Name and Away Team Name from the website you need to induce WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategies:
Using CSS_SELECTOR:
driver.get('https://www.livescore.com/en/') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click() Home_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='home-team-name']")))] Away_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='away-team-name']")))] df = pd.DataFrame(data=list(zip(Home_team_name, Away_team_name)), columns=['Home Team Name', 'Home Team Name']) print(df)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
Console Output:
Home Team Name Home Team Name 0 Bayern Munich FC Salzburg 1 Liverpool Inter 2 FC Porto Lyon 3 Real Betis Eintracht Frankfurt