In html we have 2 types of table; Horizontal and Vertical. Is there a way to detect the type of the table in python?
Maybe this can be done using panda
or BeautifulSoup
?
<h2>Horizontal Headings:</h2>
<table style="width:100%">
<tr>
<th>Name</th>
<th>Telephone</th>
<th>Telephone</th>
</tr>
<tr>
<td>Bill Gates</td>
<td>555 77 854</td>
<td>555 77 855</td>
</tr>
</table>
<h2>Vertical Headings:</h2>
<table style="width:100%">
<tr>
<th>Name:</th>
<td>Bill Gates</td>
</tr>
<tr>
<th>Telephone:</th>
<td>555 77 854</td>
</tr>
<tr>
<th>Telephone:</th>
<td>555 77 855</td>
</tr>
</table>
My current function:
def is_vertical_table(table):
# Check if table is vertical and return true.
My initial thought where to check if all th
tags are inside first
tr
tag but that doesn't seem as a perfect solution as some tags may be inside multiple tbody
tags etc...
CodePudding user response:
You can use pandas.read_html
to convert to DataFrame, then use a custom function to compare the numbers of rows and columns:
html = '''<h2>Horizontal Headings:</h2>
<table style="width:100%">
<tr>
<th>Name</th>
<th>Telephone</th>
<th>Telephone</th>
</tr>
<tr>
<td>Bill Gates</td>
<td>555 77 854</td>
<td>555 77 855</td>
</tr>
</table>
<h2>Vertical Headings:</h2>
<table style="width:100%">
<tr>
<th>Name:</th>
<td>Bill Gates</td>
</tr>
<tr>
<th>Telephone:</th>
<td>555 77 854</td>
</tr>
<tr>
<th>Telephone:</th>
<td>555 77 855</td>
</tr>
</table>
'''
def wide_or_long(df):
if df.shape[1] > df.shape[0]:
return('wide')
if df.shape[0] > df.shape[1]:
return('long')
return 'square'
# checking first table
wide_or_long(pd.read_html(html)[0])
# wide
# checking second table
wide_or_long(pd.read_html(html)[1])
# long
Alternative function based on the presence of a column header:
def wide_or_long(df):
return 'long' if list(df) == list(range(df.shape[1])) else 'wide'