Scraping website: how to load more results and getting hidden url-CodePudding

i'm trying to scrape this site: https://www.flashscore.com/basketball/italy/serie-a2-2021-2022/results/"

The first issue is that when the page is loaded at the end there's a link "Show more matches" but if i inspect document there's only this

href="#">Show more matches

but no link...so i don't know how to get the link to make python click on it.

The second issue is always about hidden url: if you click on every match a pop-up open (the first match for example get this: https://www.flashscore.com/match/z1DMLBiE/#/match-summary/match-summary). Also these links are hidden when i inspect, i would be interesed to gettin these or simply getting the code after the /match/ part, in the example above z1DMLBiE. Every match correspond to a different code, there's also an ID when you inspect that contain thes code but i'm not able to isolate it, only got a whole list from which i cannot exctrat it.

My little piece of code:

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep

URL = "https://www.flashscore.com/basketball/italy/serie-a2-2021-2022/results/"

driver = webdriver.Chrome(r"C:\chromedriver.exe")
driver.get(URL)
# Wait for page to fully render
sleep(5)

soup = BeautifulSoup(driver.page_source, "html.parser")


for div in soup.find_all(class_='sportName basketball'):
    
  
    print(div)


driver.quit()

Thanks for any kind help!

CodePudding user response：

Instead of extracting the links to follow programmatically, you can click the actual UI element with a simulated cursor using Selenium: https://www.selenium.dev/documentation/webdriver/elements/interactions/#click

CodePudding user response：

After some research of Selenium i implemented my code a little bit: now i'm able to click on the link to load more data, problem is one click is not enough...so i repeated the script 4 time but for sure there's a better way to do it untill the link exist (i tried a loop without success). I've now the proble to find the code of every match, since here i cannot make it click on every game... Here's the updated code:

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep

URL = "https://www.flashscore.com/basketball/italy/serie-a2-2021-2022/results/"

driver = webdriver.Chrome(r"C:\chromedriver.exe")
driver.get(URL)
# Wait for page to fully render
sleep(5)

driver.maximize_window()

driver.find_element_by_id('onetrust-accept-btn-handler').click()

javaScript = "document.getElementsByClassName('event__more event__more--static')[0].click();"

driver.execute_script(javaScript)
sleep(5)
driver.execute_script(javaScript)
sleep(5)
driver.execute_script(javaScript)
sleep(5)
driver.execute_script(javaScript)

soup = BeautifulSoup(driver.page_source, "html.parser")

for div in soup.find_all(class_='sportName basketball'):
    print(div)