import requests
import os
import pandas as pd
from bs4 import BeautifulSoup
#Importing html
df = pd.read_html(os.path.expanduser("~/Documents/HTMLSpider/HTMLSpider_test/spotgamma.html"))
print (df['Latest Data'])
All of the documentation I can find online states that extracting a specific column from a dataset required you to specify the name of the column header in square braces, yet this is returning a TypeError when I try to do so:
>
print (df['Latest Data'])
TypeError: list indices must be integers or slices, not str
If you're curious as to what the dataset looks like without trying to specify the column:
SpotGamma Proprietary Levels Latest Data ... NDX QQQ
0 Ref Price: 4465 ... 15283 372
1 SpotGamma Imp. 1 Day Move: 0.91%, ... NaN NaN
2 SpotGamma Imp. 5 Day Move: 2.11% ... NaN NaN
3 SpotGamma Gamma Index™: 0.48 ... 0.04 -0.08
4 Volatility Trigger™: 4415 ... 15075 373
5 SpotGamma Absolute Gamma Strike: 4450 ... 15500 370
6 Gamma Notional(MM): $157 ... $4 $-397
CodePudding user response:
Note that
df = pd.read_html(os.path.expanduser("~/Documents/HTMLSpider/HTMLSpider_test/spotgamma.html"))
will return a list of dataframes, not a single one.
See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html ("Read HTML tables into a list of DataFrame objects.")
Better do
ldf = pd.read_html(os.path.expanduser("~/Documents/HTMLSpider/HTMLSpider_test/spotgamma.html"))
and then
df = ldf[0] # replace 0 with the number of the dataframe you want
to get the first dataframe (there may be more, check len(ldf)
to see how many you got and which one has the column you need).