I am trying to extract a single "value" $82.76 from the code below.
<div >
<h6 >HEC Price</h6>
<h5 >$82.76</h5>
</div>
My code is
from bs4 import BeautifulSoup
with open('HectorDAO.mhtml', 'r') as html_file:
content = html_file.read()
soup = BeautifulSoup(content, 'html.parser')
tags = soup.find('h6', text='HEC Price')
tag = tags.next_sibling.get_text()
print(tag)
I expect $82.76 but somehow I get strange output
$8=
2.76
What I am doing wrong here?
CodePudding user response:
Changing '.next_sibling' to '.find_next_sibling()' does the trick.
content = html_file.read()
soup = BeautifulSoup(content, 'html.parser')
tags = soup.find('h6', text='HEC Price')
tag = tags.find_next_sibling().get_text()
print(tag)
EDIT:
I would not recommend using nested .next_sibling as find_next_sibling() does this for you. See Documentation.
CodePudding user response:
tags = soup.find('h6', text='HEC Price')
tag = tags.find_next_sibling().get_text()
print(tag)
It might work if you call find_next_sibling()
instead.
Also try quopri
:
from quopri import decodestring
from bs4 import BeautifulSoup
with open('HectorDAO.mhtml', 'r') as html_file:
content = decodestring(html_file.read()).decode()
soup = BeautifulSoup(content, 'html.parser')
tags = soup.find('h6', text='HEC Price')
tag = tags.find_next_sibling().get_text()
print(tag)
Or try both, MHTML can be tricky.