Hello everyone I'm using BS4 to pull a table from an HTML webpage and trying to add it to a pandas data frame but it's very sloppy when I pull it and I can't seem to get it to print properly if anyone can help?
There is only 1 table available on the webpage and this is the code I'm using. and what it's pulling.
soup = BeautifulSoup(driver.page_source,'html.parser')
df = pd.read_html(str(soup))
print (df)
results:
[ Unnamed: 0 Student Number Student Name Placement Date
0 NaN 20808456 Sandy Gurlow 01/13/2023
1 NaN NaN NaN NaN]
But I've tried to use
df.dropna(inplace=True)
and I get the error code
AttributeError: 'list' object has no attribute 'dropna'
CodePudding user response:
pandas.read_html
returns a list of dataframes, with as many dataframes as it found tables in the input.
You need to use:
df = pd.read_html(driver.page_source)[0]
Or, to avoid IndexError in case of no table:
l = pd.read_html(driver.page_source)
if l:
df = l[0]
else:
print('no table found')