i'm trying to scrape this site: https://www.flashscore.com/basketball/italy/serie-a2-2021-2022/results/"
The first issue is that when the page is loaded at the end there's a link "Show more matches" but if i inspect document there's only this
href="#">Show more matches
but no link...so i don't know how to get the link to make python click on it.
The second issue is always about hidden url: if you click on every match a pop-up open (the first match for example get this: https://www.flashscore.com/match/z1DMLBiE/#/match-summary/match-summary). Also these links are hidden when i inspect, i would be interesed to gettin these or simply getting the code after the /match/ part, in the example above z1DMLBiE. Every match correspond to a different code, there's also an ID when you inspect that contain thes code but i'm not able to isolate it, only got a whole list from which i cannot exctrat it.
My little piece of code:
from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
URL = "https://www.flashscore.com/basketball/italy/serie-a2-2021-2022/results/"
driver = webdriver.Chrome(r"C:\chromedriver.exe")
driver.get(URL)
# Wait for page to fully render
sleep(5)
soup = BeautifulSoup(driver.page_source, "html.parser")
for div in soup.find_all(class_='sportName basketball'):
print(div)
driver.quit()
Thanks for any kind help!
CodePudding user response:
Instead of extracting the links to follow programmatically, you can click the actual UI element with a simulated cursor using Selenium: https://www.selenium.dev/documentation/webdriver/elements/interactions/#click
CodePudding user response:
After some research of Selenium i implemented my code a little bit: now i'm able to click on the link to load more data, problem is one click is not enough...so i repeated the script 4 time but for sure there's a better way to do it untill the link exist (i tried a loop without success). I've now the proble to find the code of every match, since here i cannot make it click on every game... Here's the updated code:
from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
URL = "https://www.flashscore.com/basketball/italy/serie-a2-2021-2022/results/"
driver = webdriver.Chrome(r"C:\chromedriver.exe")
driver.get(URL)
# Wait for page to fully render
sleep(5)
driver.maximize_window()
driver.find_element_by_id('onetrust-accept-btn-handler').click()
javaScript = "document.getElementsByClassName('event__more event__more--static')[0].click();"
driver.execute_script(javaScript)
sleep(5)
driver.execute_script(javaScript)
sleep(5)
driver.execute_script(javaScript)
sleep(5)
driver.execute_script(javaScript)
soup = BeautifulSoup(driver.page_source, "html.parser")
for div in soup.find_all(class_='sportName basketball'):
print(div)