Home > front end >  remove nan from table data python?
remove nan from table data python?

Time:04-27

Hello everyone I'm using BS4 to pull a table from an HTML webpage and trying to add it to a pandas data frame but it's very sloppy when I pull it and I can't seem to get it to print properly if anyone can help?

There is only 1 table available on the webpage and this is the code I'm using. and what it's pulling.

soup = BeautifulSoup(driver.page_source,'html.parser')
df = pd.read_html(str(soup))
print (df)

results:

[   Unnamed: 0    Student Number     Student Name    Placement Date
0         NaN      20808456          Sandy Gurlow    01/13/2023 
1         NaN            NaN                NaN         NaN]

But I've tried to use

df.dropna(inplace=True)

and I get the error code

AttributeError: 'list' object has no attribute 'dropna'

CodePudding user response:

pandas.read_html returns a list of dataframes, with as many dataframes as it found tables in the input.

You need to use:

df = pd.read_html(driver.page_source)[0]

Or, to avoid IndexError in case of no table:

l = pd.read_html(driver.page_source)
if l:
    df = l[0]
else:
    print('no table found')
  • Related