I have extracted some html using BeautifulSoup, and created a function to get the useful information only. I intend to run this function for multiple keywords, and put them in a dataframe. However, I cannot get to all lists into the pandas DataFrame.
Example:
words = ['header', 'title', 'number']
The following code gets me lists all headers, titles and numbers and are all the same length.
def create_list(x):
column = []
BRKlist = BRK.find_all(x)
for n in BRKlist:
drop_beginning = r'<' x '>'
drop_end = r'</' x '>'
no_beginning = re.sub(drop_beginning, '', str(n))
final = re.sub(drop_end, '', str(no_beginning))
column.append(final)
print(column)
This code outputs:
['header1', 'header2', 'header3']
['title1', 'title2', 'title3']
['number1', 'number2', 'number3']
I am looking for something to get 1 dataframe that gives me a DataFrame that looks like this:
header | title | number |
---|---|---|
header1 | title1 | number1 |
header2 | title2 | number2 |
header3 | title3 | number3 |
Getting the lists was no problem, but when I make an empty data frame:
df = pd.DataFrame({x: []})
and try to append the columns, I get the following error:
TypeError: unhashable type: 'list'
Is there any way to circumvent this, or any other/easier way to "append columns"?
CodePudding user response:
If you want to build a dataframe with only three columns, the easiest way maybe is:
import pandas as pd
A= [['header1', 'header2', 'header3'],
['title1', 'title2', 'title3'],
['number1', 'number2', 'number3']]
df= pd.DataFrame()
df['header']= [A[0][i] for i in range(3)]
df['title']= [A[1][i] for i in range(3)]
df['number']= [A[2][0] for i in range(3)]
df