How do I export a read_html df to Excel, when it related to table ID rather than data in the code?-CodePudding

I am experiencing this error with the code below:

File "\<stdin\>", line 1, in \<module\>
AttributeError: 'list' object has no attribute 'to_excel'

I want to save the table I am scraping from wikipedia to an Excel file - but I can't work out how to adjust the code to get the data list from the terminal to the Excel file using to_excel.

I can see it works for a similar problem when a dataset has data set out as a 'DataFrame' (i.e. df = pd.DataFrame(data, columns = \['Product', 'Price'\]).

But can't work out how to adjust my code for the df = pd.read*html(str(congresstable)) line - which I think is the issue. (i.e. using read*_html and sourcing the data from a table id)

How can I adjust the code to make it save an excel file to the path specified?

from bs4 import BeautifulSoup
import requests
import pandas as pd

wiki_url = 'https://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_House_of_Representatives'
table_id = 'votingmembers'

response = requests.get(wiki_url)
soup = BeautifulSoup(response.text, 'html.parser')

congress_table = soup.find('table', attrs={'id': table_id})
df = pd.read_html(str(congress_table))

df.to_excel (r'C:\Users\name\OneDrive\Code\.vscode\Test.xlsx', index = False, header=True)

print(df)

I was expecting the data list to be saved to Excel at the folder path specified.

I tried following multiple guides, but they don't show the read_html item, only DataFrame solutions.

CodePudding user response：

pandas.read_html() creates a list of tables respectiv dataframe objects, so you have to pick one by index in your case [0] - You also do not need requests and BeautifulSoup, separatly, just go with pandas.read_html()

pd.read_html(wiki_url,attrs={'id': table_id})[0]

Example

import pandas as pd

wiki_url = 'https://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_House_of_Representatives'
table_id = 'votingmembers'

congress_table = soup.find('table', )
df = pd.read_html(wiki_url,attrs={'id': table_id})[0]
df.to_excel (r'C:\Users\name\OneDrive\Code\.vscode\Test.xlsx', index = False, header=True)