Home > other >  BS4 get the TH data within the table
BS4 get the TH data within the table

Time:02-18

I am trying to read data from a website which has a table like this:

<table border="0" width = "100%">
    <tr>
        <th width="33%">Product</th>
        <th width="34%">ID</th>
        <th width="33%">Serial</th>
    </tr>
    <tr>
        <td align='center'>
            <a target="_TOP" href="Link1.html">ProductName</a>
            <br>
            <a href='Link2.html' TARGET='_TOP'><img src='https://?uid=1'></a>
        </td>

        <td align='center'>
            <a target="_TOP" href="Link2.html">ProductID</a>
            <br>
            <a href='Link2.html' TARGET='_TOP'><img src='https://?uid=2'></a>
        </td>

        <td align='center'>
            <a target="_TOP" href="Link3.html">ProductSerial</a>
            <br>
            <a href='Link2.html' TARGET='_TOP'><img src='https://?uid=3'></a>
        </td>
    </tr>
</table>

and all I want from this table is the ProductID which is content inside of the tag.

The problem is, I am trying to use BS4 for this, to find the TAG, and read inside of it, but how to accurately point BS4 to it?

I have tried:

with open("src/file.html", 'r') as inf:
        html = inf.read()
        soup = bs4.BeautifulSoup(html, features="html.parser")
    for container in soup.find_all("table", {"td": ""}):
        print(container) 

But doesn't work..Is there Any way to achieve this? To read the content inside of the a tag?

CodePudding user response:

You can use the :nth-of-type CSS selector:

print(soup.select_one("td:nth-of-type(2) a:nth-of-type(1)").text)

Output:

ProductID
  • Related