Sample html code:
<div>
Hello everyone how are you
<sup>Hello hi</sup>
<figure>Blah Blah<img /><figure>
</div>
I tried using decompose() function in BeautifulSoup but it also destroys the sup tag. Can anyone help me out?
CodePudding user response:
To get text of the <sup>
tag:
from bs4 import BeautifulSoup
html_doc = """\
<div>
Hello everyone how are you
<sup>Hello hi</sup>
<figure>Blah Blah<img /></figure>
</div>"""
soup = BeautifulSoup(html_doc, "html.parser")
print(soup.sup.text)
Prints:
Hello hi
To remove the <img />
tag:
soup.img.extract()
print(soup.div)
Prints:
<div>
Hello everyone how are you
<sup>Hello hi</sup>
<figure>Blah Blah</figure>
</div>
CodePudding user response:
from bs4 import BeautifulSoup
html_doc = """\
<div>
Hello everyone how are you
<sup>Hello hi</sup>
<figure>Blah Blah<img /></figure>
</div>"""
soup = BeautifulSoup(html_doc,'lxml')
a = soup.find('div')
b = a.find('sup').text
print(b)
Sorry if something isnt right but i am on the phone and i cant test it out. And you need to do pip install lxml
and at the file.html
put the file or the website