Home > Mobile >  Pandas read_html always returns NaNs for table
Pandas read_html always returns NaNs for table

Time:02-14

I have tried many variations suggested here already but I have yet to fix the problem. I started with

page = requests.get('http://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=industries&sector=10')
df_list = pd.read_html(page.text)

and I can see the correct headers so I am looking at the right location. I then tried changing the flavors to bs4 and html5lib with no change. I always see NaN for the data values and only have one index, index 0, when there should be 3 or 4. My original attempt is the same as another section of code for a different table from the same website and it worked perfectly. (also first post, please let me know how I can improve them)

CodePudding user response:

Unfortunately, I had to use selenium to retrieve the dataframe. But if that is not a problem feel free to try the following:

from selenium import webdriver
import pandas as pd

driver = webdriver.Chrome('<PATH_TO_WEBDRIVER>')
driver.get('https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=industries&sector=10')
df = pd.read_html(driver.find_element_by_id('tableSort').get_attribute('outerHTML'))[0]

Using this script, I got the following df: pandas dataframe from given link

  • Related