How to get a value from th without a value from tag a
<th scope="col">1926
<sup id="cite_ref-2011CH_22-0" >
<a href="#cite_note-2011CH-22">[22]</a>
</sup>
</th>
I tried
table = soup.find('table', {"class": "standard"})
data_th = table.find('tbody').find_all('tr', {"class": "bright"})
for tr in data_th:
th_list = tr.find_all('th')
for th in th_list:
if(th.find('a')):
print(th.text)
but in the end it turns out
1926[22]
1931[23]
1939[23]
and i need
1926
1931
1939
CodePudding user response:
One approache is to select only the text of your target.
th.find(text=True, recursive=False)
Example
from bs4 import BeautifulSoup
html='''
<th scope="col">1926
<sup id="cite_ref-2011CH_22-0" >
<a href="#cite_note-2011CH-22">[22]</a>
</sup>
another text
</th>
'''
soup = BeautifulSoup(html)
for th in soup.find_all('th'):
print(th.find(text=True, recursive=False).text)
CodePudding user response:
I think what you are looking for is to ommit recursive search by th.find(text=True, recursive=False)
I don't understand what your code:
if(th.find('a')):
print(th.text)
means. As it is written, it seems that you want to print only if there is a element inside. Your description seems you are trying to achieve the opposite.