Python Pandas - read_html No tables Found-CodePudding

I am very new to python and trying to do my own data analysis.

I am trying to parse data from this website: https://www.tsn.ca/nhl/statistics

I wanted to get the table in a data frame format.

I tried this:

import pandas as pd

players_list_unclean = pd.read_html('https://www.sportsnet.ca/hockey/nhl/players/?season=2021&?seasonType=reg&tab=Skaters')

I get the following error:

raise ValueError("No tables found") ValueError: No tables found

I can see there is table, but for some reason it is not being read.

I found another stack overflow solution recommending using selenium:

pandas read_html ValueError: No tables found

However, when I tried to implement this code I could not find the table ID in the html page source. Does anyone know another way to do this? I have tried other websites, but I ultimately have the same issue.

from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")

head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')

list_rows = []

for items in body.find_element_by_tag_name('tr'):
    list_cells = []
    for item in items.find_elements_by_tag_name('td'):
        list_cells.append(item.text)
    list_rows.append(list_cells)
driver.close() ```

CodePudding user response：

If you right click the table and choose inspect, you will see that the "table" on that page is not actually using the html table element.

From the Pandas documentation:

This function searches for <table> elements and only for <tr> and <th> rows and <td> elements within each <tr> or <th> element in the table.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html

I don't think this will work on this page. Probably need to find another data source.

CodePudding user response：

There's no table but you're in luck because the data is coming from a fetch:

https://datacrunch.9c9media.ca/statsapi/sports/hockey/leagues/nhl/sortablePlayerSeasonStats/skater?brand=tsn&type=json&seasonType=regularSeason&season=2021