Home > front end >  webscraping: missing href attribute - simulate mouse clicks for webscraping necessary?
webscraping: missing href attribute - simulate mouse clicks for webscraping necessary?

Time:01-05

For a fun webscraping project, I want to collect NHL data from ttps://www.nhl.com/stats/teams.

There is a clickable Excel Export tag which I can find using selenium and bs4.

Unfortunately, this is where it ends: Since there is no href attribute it seems that I cannot access the data.

I got what I wanted by using pynput to simulatie a mouseclick, but I wonder:

Could I do that differently? If feels so clumsy.

-> the tag with the Export Icon can be found here :

a 
  

-> Here is my code

`import pynput
from pynput.mouse import Button, Controller
import time

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome(executable_path = 'somepath\chromedriver.exe')

URL = 'https://www.nhl.com/stats/teams'

driver.get(URL)
html = driver.page_source  # DOM with JavaScript execution complete
soup = BeautifulSoup(html)
body = soup.find('body')
print(body.prettify())


mouse = Controller()

time.sleep(5) # Sleep for 5 seconds until page is loaded
mouse.position = (1204, 669) # thats where the icon is on my screen
mouse.click(Button.left, 1) # executes download`

CodePudding user response:

There is no href attribute, download is triggert JS. While working with selenium find your element and use .click() to download the file:

driver.find_element(By.CSS_SELECTOR,'h2>a').click()

Used css selectors here to get direct children <a> of the <h2> or select it directly by class starting with styles__ExportIcon:

driver.find_element(By.CSS_SELECTOR,'a[class^="styles__ExportIcon"]').click()

Example

You may have to deal with onetrust banner, so click it first and than download the sheet.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

url = 'https://www.nhl.com/stats/teams'
driver.get(url)
driver.find_element(By.CSS_SELECTOR,'#onetrust-reject-all-handler').click()
driver.find_element(By.CSS_SELECTOR,'h2>a').click()
  • Related