For a fun webscraping project, I want to collect NHL data from ttps://www.nhl.com/stats/teams.
There is a clickable Excel Export tag which I can find using selenium
and bs4
.
Unfortunately, this is where it ends:
Since there is no href
attribute it seems that I cannot access the data.
I got what I wanted by using pynput
to simulatie a mouseclick, but I wonder:
Could I do that differently? If feels so clumsy.
-> the tag with the Export Icon can be found here :
a
-> Here is my code
`import pynput
from pynput.mouse import Button, Controller
import time
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome(executable_path = 'somepath\chromedriver.exe')
URL = 'https://www.nhl.com/stats/teams'
driver.get(URL)
html = driver.page_source # DOM with JavaScript execution complete
soup = BeautifulSoup(html)
body = soup.find('body')
print(body.prettify())
mouse = Controller()
time.sleep(5) # Sleep for 5 seconds until page is loaded
mouse.position = (1204, 669) # thats where the icon is on my screen
mouse.click(Button.left, 1) # executes download`
CodePudding user response:
There is no href
attribute, download is triggert JS. While working with selenium
find your element and use .click()
to download the file:
driver.find_element(By.CSS_SELECTOR,'h2>a').click()
Used css selectors
here to get direct children <a>
of the <h2>
or select it directly by class starting with styles__ExportIcon
:
driver.find_element(By.CSS_SELECTOR,'a[class^="styles__ExportIcon"]').click()
Example
You may have to deal with onetrust banner, so click it first and than download the sheet.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
url = 'https://www.nhl.com/stats/teams'
driver.get(url)
driver.find_element(By.CSS_SELECTOR,'#onetrust-reject-all-handler').click()
driver.find_element(By.CSS_SELECTOR,'h2>a').click()