Home > Back-end >  how to get a value from a th tag containing a nested tag parsing in python?
how to get a value from a th tag containing a nested tag parsing in python?

Time:03-07

How to get a value from th without a value from tag a

<th scope="col">1926
    <sup id="cite_ref-2011CH_22-0" >
        <a href="#cite_note-2011CH-22">[22]</a>
    </sup>
</th>

I tried

 table = soup.find('table', {"class": "standard"})
 data_th = table.find('tbody').find_all('tr', {"class": "bright"})
 for tr in data_th:
     th_list = tr.find_all('th')
     for th in th_list:
         if(th.find('a')):
             print(th.text)

but in the end it turns out

1926[22]
1931[23]
1939[23]

and i need

1926
1931
1939

CodePudding user response:

One approache is to select only the text of your target.

th.find(text=True, recursive=False)

Example

from bs4 import BeautifulSoup

html='''
<th scope="col">1926
    <sup id="cite_ref-2011CH_22-0" >
        <a href="#cite_note-2011CH-22">[22]</a>
    </sup>
   another text
</th>
'''

soup = BeautifulSoup(html)

for th in soup.find_all('th'):
    print(th.find(text=True, recursive=False).text)

CodePudding user response:

I think what you are looking for is to ommit recursive search by th.find(text=True, recursive=False)

I don't understand what your code:

if(th.find('a')):
    print(th.text)

means. As it is written, it seems that you want to print only if there is a element inside. Your description seems you are trying to achieve the opposite.

  • Related