Home > database >  Extraction of tds from table using BeautifulSoup and arranging them in Pandas dataframe together wit
Extraction of tds from table using BeautifulSoup and arranging them in Pandas dataframe together wit

Time:12-16

I have the following html code which I have extracted:

<table id=table1>

  <thead>
    <tr >
      <th id="header1">
        "Column 1 Title"
      </th>
      <th id="header2">
        "Column 2 Title"
      </th>
    </tr>
  </thead>
  
  <tbody>
    <tr >
      <td headers="header1">firstrowcolumn1data</td>
      <td headers="header2">firstrowcolumn2data</td>
    </tr>
    <tr >
      <td headers="header1">secondrowcolumn1data</td>
      <td headers="header2">secondrowcolumn2data</td>
    </tr>
  </tbody>
</table>

I need to extract the table data and id of the table (table1) then arrange them into a Pandas dataframe, similar to this:

id table data
table1 firstrowcolumn1data
table1 firstrowcolumn2data
table1 secondrowcolumn1data
table1 secondrowcolumn2data

CodePudding user response:

Try this:

data = []
for table in s.find_all('table'):
    for td in table.find_all('td'):
        data.append((table.get('id'), td.text))
df = pd.DataFrame(data, columns=['id', 'table data'])

Output:

>>> df
       id            table data
0  table1   firstrowcolumn1data
1  table1   firstrowcolumn2data
2  table1  secondrowcolumn1data
3  table1  secondrowcolumn2data
  • Related