Hi I want to scrape the content of a table from a website by using the python code.HTML of the table is mentioned below.
<table title=""> <tbody>
<tr>
<td colspan="7"><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
</tr>
<td colspan="7"><a href="persondetail.php?custnumber">abc</a><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
<td colspan="7"><a href="persondetail.php?custnumbe">abc</a><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
<td colspan="7"><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
<td colspan="7"><a href="persondetail.php?custnumber">abc</a><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
<td colspan="7"><a href="persondetail.php?custnumber">abc</a><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
</tbody>
</table>
Below is the Python code which I'm using to scrape the above HTML.
table_data = soup.find('tbody')
for j in table_data.find_all('tr'):
row_data = j.find_all('td')
row = [tr.text for tr in row_data]
thewriter.writerow (row)
when I get the Result it returns only 1st two rows because the other rows are without "tr".
CodePudding user response:
You maybe could directly use the find_all("td")
like this:
table_data = soup.find('tbody')
for j in table_data.find_all('td'):
row = [tr.text for tr in j]
thewriter.writerow (row)
The find_all() method looks through a tag’s descendants and retrieves all descendants that match your filters. I gave several examples in Kinds of filters, but here are a few more.
From the documentation
CodePudding user response:
All tds are direct child node of table tag except tr>td
from bs4 import BeautifulSoup
html_doc="""
<table title=""> <tbody>
<tr>
<td colspan="7"><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
</tr>
<td colspan="7"><a href="persondetail.php?custnumber">abc</a><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
<td colspan="7"><a href="persondetail.php?custnumbe">abc</a><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
<td colspan="7"><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
<td colspan="7"><a href="persondetail.php?custnumber">abc</a><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
<td colspan="7"><a href="persondetail.php?custnumber">abc</a><br/></td>
<td style="text-align:center;"><strong>N/A*</strong></td>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
tds= soup.select('.table td')
for td in tds:
print(td.text)
Output:
N/A*
abc
N/A*
abc
N/A*
N/A*
abc
N/A*
abc
N/A*