Home > Blockchain >  Webscraping BeautifulSoup object not extractable due to formatting problem
Webscraping BeautifulSoup object not extractable due to formatting problem

Time:07-04

I want to extract information from a website, however I can not access the information I want to because the html-code is formatted in a way that doesn't allow me to access the information. In the html Code below, I would like to extract the mtl. You can see that the after the class=, the '3D "closes" before the whole class name is finished. I tried every possible version to access the mtl., but its not possible.

<div class='3D"ServiceOffer_badge__kriSF"'>
  <div>
   <p ce__offering___1cjqq="" class='3D"Pri=' inline;"="" price__brand___2kedu="" price__large="___35JMV" price__price___38oh2="" price__price___38oh2"="" style='3D"display:'>
    <span class='3D"Pr=' ice__value___wawnq"="">
     0 =E2=82=AC
    </span>
    <span class='3D"Price__suffix___1=' d8-m"="">
     mtl.
    </span>

Do you have any idea how to do this? Thank you so much in advance!

CodePudding user response:

from bs4 import BeautifulSoup


html = '''<div class='3D"ServiceOffer_badge__kriSF"'>
  <div>
   <p ce__offering___1cjqq="" class='3D"Pri=' inline;"="" price__brand___2kedu="" price__large="___35JMV" price__price___38oh2="" price__price___38oh2"="" style='3D"display:'>
    <span class='3D"Pr=' ice__value___wawnq"="">
     0 =E2=82=AC
    </span>
    <span class='3D"Price__suffix___1=' d8-m"="">
     mtl.
    </span>'''
    
soup = BeautifulSoup(html, 'html.parser')
spanStr = soup.find('span', {'class':'3D"Price__suffix___1='}).text.strip()

Output:

print(spanStr)
mtl.
  • Related