Home > Net >  Element finding with repeated tags in python selenium
Element finding with repeated tags in python selenium

Time:03-22

This is the html I have on a website:

<table >
    <tbody><tr><th >Certification Number</th><td >48487270</td></tr>
        <tr>
            <th>Label Type</th>
            <td>
                    <img width="69" height="38"  alt="" aria-hidden="true" src="https://i.psacard.com/psacard/images/cert/table-image-ink.png" style="">
                    <span >with fugitive ink technology</span>
            </td>
        </tr>
    <tr><th>Reverse Cert Number/Barcode</th><td>Yes</td></tr>
    <tr><th>Year</th><td>2020</td></tr>
    <tr><th>Brand</th><td>TOPPS</td></tr>
    <tr><th>Sport</th><td>BASEBALL CARDS</td></tr>
    <tr><th>Card Number</th><td>20</td></tr>
    <tr><th>Player</th><td>ARISTIDES AQUINO</td></tr>
    <tr><th>Variety/Pedigree</th><td></td></tr>

    <tr><th>Grade</th><td>NM-MT 8</td></tr>
                    </tbody></table>

I am trying to figure out a way to get and set the year to a variable, the normal way I find elements is with XPath but since these tags are repeated so many times with no other indicators I am unsure how to go about this. The year will change so I cant search by text. Any help would be appreciated.

CodePudding user response:

Firstly we need to find out webelements using driver.findelements function using that classname

And then we can get elements from that list

By list.get(index)

Or, You can store all the td/th elements in a list and than search the list for year you are looking for.

CodePudding user response:

Use BeautifulSoup to find the <th> tag with the text 'Year'. Then find the next <td> tag and extract the text from that:

from bs4 import BeautifulSoup

html = '''<table >
    <tbody><tr><th >Certification Number</th><td >48487270</td></tr>
        <tr>
            <th>Label Type</th>
            <td>
                    <img width="69" height="38"  alt="" aria-hidden="true" src="https://i.psacard.com/psacard/images/cert/table-image-ink.png" style="">
                    <span >with fugitive ink technology</span>
            </td>
        </tr>
    <tr><th>Reverse Cert Number/Barcode</th><td>Yes</td></tr>
    <tr><th>Year</th><td>2020</td></tr>
    <tr><th>Brand</th><td>TOPPS</td></tr>
    <tr><th>Sport</th><td>BASEBALL CARDS</td></tr>
    <tr><th>Card Number</th><td>20</td></tr>
    <tr><th>Player</th><td>ARISTIDES AQUINO</td></tr>
    <tr><th>Variety/Pedigree</th><td></td></tr>

    <tr><th>Grade</th><td>NM-MT 8</td></tr>
                    </tbody></table>'''
                    
soup = BeautifulSoup(html, 'html.parser')
year = soup.find('th', text='Year').find_next('td').text

print(year)

Output:

'2020'
  • Related