Home > Mobile >  BeautifulSoup how can I remove div element from h1 tag
BeautifulSoup how can I remove div element from h1 tag

Time:10-31

I have tried below code for parser html using BeautifulSoup.

item_detail_soup = BeautifulSoup(html, "html.parser")
h1 = item_detail_soup.find("h1")

My H1 parser output is :

<h1>
<div class="brand" style="display: block; font-size: 0.75rem;">Apple(#34567)</div>
〔NEW〕 iPhone12 256GB </h1>

I'm trying to remove this div witch with class name brand.

My desire output :

<h1> (NEW) iPhone12 256GB </h1>

I have tried by extract() then replace , But I have failed.

h1 = item_detail_soup.find("h1")
h1 = h1.replace(item_detail_soup.find("h1").div.extract(),'')

How can I get desire output ?

CodePudding user response:

Try this

item_detail_soup = BeautifulSoup(html, "html.parser")
for div in item_detail_soup .find_all("div", {'class':'brand'}): 
    div.decompose()
h1 = item_detail_soup.find("h1")

CodePudding user response:

You were on the right track - To get your goal you can go with .extract(), .replace_with() and .decompose() as well.

What is the difference extract vs decompose?

While .extract() removes a tag or string from the tree and returns it / keeps it as additional parse tree decompose() removes the tag from the tree and destroys it and its contents completely.

Example

Be aware that you just can use one option a time, so I commented the others out.

from bs4 import BeautifulSoup

html = '''<h1><div  style="display: block; font-size: 0.75rem;">Apple(#34567)</div>〔NEW〕 iPhone12 256GB </h1>'''
soup = BeautifulSoup(html, 'lxml')
for item in soup.select('div.brand'):
    item.extract()
    #item.decompose()
    #item.replace_with('')
    
soup.h1

Output

<h1>〔NEW〕 iPhone12 256GB </h1>
  • Related