I am trying to read data from a website which has a table like this:
<table border="0" width = "100%">
<tr>
<th width="33%">Product</th>
<th width="34%">ID</th>
<th width="33%">Serial</th>
</tr>
<tr>
<td align='center'>
<a target="_TOP" href="Link1.html">ProductName</a>
<br>
<a href='Link2.html' TARGET='_TOP'><img src='https://?uid=1'></a>
</td>
<td align='center'>
<a target="_TOP" href="Link2.html">ProductID</a>
<br>
<a href='Link2.html' TARGET='_TOP'><img src='https://?uid=2'></a>
</td>
<td align='center'>
<a target="_TOP" href="Link3.html">ProductSerial</a>
<br>
<a href='Link2.html' TARGET='_TOP'><img src='https://?uid=3'></a>
</td>
</tr>
</table>
and all I want from this table is the ProductID which is content inside of the tag.
The problem is, I am trying to use BS4 for this, to find the TAG, and read inside of it, but how to accurately point BS4 to it?
I have tried:
with open("src/file.html", 'r') as inf:
html = inf.read()
soup = bs4.BeautifulSoup(html, features="html.parser")
for container in soup.find_all("table", {"td": ""}):
print(container)
But doesn't work..Is there Any way to achieve this? To read the content inside of the a tag?
CodePudding user response:
You can use the :nth-of-type
CSS selector:
print(soup.select_one("td:nth-of-type(2) a:nth-of-type(1)").text)
Output:
ProductID