I am very new to python and trying to do my own data analysis.
I am trying to parse data from this website: https://www.tsn.ca/nhl/statistics
I wanted to get the table in a data frame format.
I tried this:
import pandas as pd
players_list_unclean = pd.read_html('https://www.sportsnet.ca/hockey/nhl/players/?season=2021&?seasonType=reg&tab=Skaters')
I get the following error:
raise ValueError("No tables found") ValueError: No tables found
I can see there is table, but for some reason it is not being read.
I found another stack overflow solution recommending using selenium:
pandas read_html ValueError: No tables found
However, when I tried to implement this code I could not find the table ID in the html page source. Does anyone know another way to do this? I have tried other websites, but I ultimately have the same issue.
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")
head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')
list_rows = []
for items in body.find_element_by_tag_name('tr'):
list_cells = []
for item in items.find_elements_by_tag_name('td'):
list_cells.append(item.text)
list_rows.append(list_cells)
driver.close() ```
CodePudding user response:
If you right click the table and choose inspect, you will see that the "table" on that page is not actually using the html table element.
From the Pandas documentation:
This function searches for <table> elements and only for <tr> and <th> rows and <td> elements within each <tr> or <th> element in the table.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html
I don't think this will work on this page. Probably need to find another data source.
CodePudding user response:
There's no table but you're in luck because the data is coming from a fetch: