I have the following html code which I have extracted:
<table id=table1>
<thead>
<tr >
<th id="header1">
"Column 1 Title"
</th>
<th id="header2">
"Column 2 Title"
</th>
</tr>
</thead>
<tbody>
<tr >
<td headers="header1">firstrowcolumn1data</td>
<td headers="header2">firstrowcolumn2data</td>
</tr>
<tr >
<td headers="header1">secondrowcolumn1data</td>
<td headers="header2">secondrowcolumn2data</td>
</tr>
</tbody>
</table>
I need to extract the table data and id of the table (table1) then arrange them into a Pandas dataframe, similar to this:
id | table data |
---|---|
table1 | firstrowcolumn1data |
table1 | firstrowcolumn2data |
table1 | secondrowcolumn1data |
table1 | secondrowcolumn2data |
CodePudding user response:
Try this:
data = []
for table in s.find_all('table'):
for td in table.find_all('td'):
data.append((table.get('id'), td.text))
df = pd.DataFrame(data, columns=['id', 'table data'])
Output:
>>> df
id table data
0 table1 firstrowcolumn1data
1 table1 firstrowcolumn2data
2 table1 secondrowcolumn1data
3 table1 secondrowcolumn2data