Home > Blockchain >  Colspan not working properly using Python Pandas
Colspan not working properly using Python Pandas

Time:03-02

I have some data that i need to convert into an Excel sheet which needs to look like this at the end of the day:

enter image description here

I've tried the following code:

import pandas as pd

result = pd.read_html(
    """<table>
        <tr>
            <th colspan="2">Status N</th>
        </tr>
        <tr>
            <td style="font-weight: bold;">Merchant</td>
            <td>Count</td>
        </tr>
        <tr>
            <td>John Doe</td>
            <td>10</td>
        </tr>
        </table>"""
)

writer = pd.ExcelWriter('out/test_pd.xlsx', engine='xlsxwriter')

print(result[0])

result[0].to_excel(writer, sheet_name='Sheet1', index=False)

writer.save()

This issue here is that the colspan is not working properly. The output is like this instead:

enter image description here

Can someone help me on how i can use colspan on Python Pandas? It would be better if i don't have to use read_html() and do it directly on python code but if it's not possible, i can use read_html()

CodePudding user response:

Since Pandas can't recognize the values and columns title you should introduce them, if you convert HTML text to the standard format, then pandas can handle it correctly. use thead and tbody to split header and values like this.

result = pd.read_html("""
  <table>
    <thead>
        <tr>
            <th colspan="2">Status N</th>
        </tr>
        <tr>
            <td style="font-weight: bold;">Merchant</td>
            <td>Count</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>John Doe</td>
            <td>10</td>
        </tr>
    </tbody>
 </table>
"""
)

To write Dataframe to an excel file you can use the pandas enter image description here

  • Related