Home > Back-end >  Extracting value with beautifulsoup get_text() problem
Extracting value with beautifulsoup get_text() problem

Time:12-17

I am trying to extract a single "value" $82.76 from the code below.

<div >
    <h6 >HEC Price</h6>
    <h5 >$82.76</h5>
</div>

My code is

from bs4 import BeautifulSoup

with open('HectorDAO.mhtml', 'r') as html_file:
    content = html_file.read()
    
    soup = BeautifulSoup(content, 'html.parser')
    
    tags = soup.find('h6', text='HEC Price')
    tag = tags.next_sibling.get_text()
    print(tag)

I expect $82.76 but somehow I get strange output

$8= 
2.76

What I am doing wrong here?

CodePudding user response:

Changing '.next_sibling' to '.find_next_sibling()' does the trick.

content = html_file.read()

soup = BeautifulSoup(content, 'html.parser')

tags = soup.find('h6', text='HEC Price')
tag = tags.find_next_sibling().get_text()
print(tag)

EDIT:

I would not recommend using nested .next_sibling as find_next_sibling() does this for you. See Documentation.

CodePudding user response:

tags = soup.find('h6', text='HEC Price')
tag = tags.find_next_sibling().get_text()
print(tag)

It might work if you call find_next_sibling() instead.

Also try quopri:

from quopri import decodestring
from bs4 import BeautifulSoup

with open('HectorDAO.mhtml', 'r') as html_file:
    content = decodestring(html_file.read()).decode()
    
    soup = BeautifulSoup(content, 'html.parser')
    
    tags = soup.find('h6', text='HEC Price')
    tag = tags.find_next_sibling().get_text()
    print(tag)

Or try both, MHTML can be tricky.

  • Related