I have tried below code for parser html using BeautifulSoup.
item_detail_soup = BeautifulSoup(html, "html.parser")
h1 = item_detail_soup.find("h1")
My H1 parser output is :
<h1>
<div class="brand" style="display: block; font-size: 0.75rem;">Apple(#34567)</div>
〔NEW〕 iPhone12 256GB </h1>
I'm trying to remove this div witch with class name brand
.
My desire output :
<h1> (NEW) iPhone12 256GB </h1>
I have tried by extract() then replace , But I have failed.
h1 = item_detail_soup.find("h1")
h1 = h1.replace(item_detail_soup.find("h1").div.extract(),'')
How can I get desire output ?
CodePudding user response:
Try this
item_detail_soup = BeautifulSoup(html, "html.parser")
for div in item_detail_soup .find_all("div", {'class':'brand'}):
div.decompose()
h1 = item_detail_soup.find("h1")
CodePudding user response:
You were on the right track - To get your goal you can go with .extract()
, .replace_with()
and .decompose()
as well.
What is the difference extract vs decompose?
While .extract()
removes a tag or string from the tree and returns it / keeps it as additional parse tree decompose()
removes the tag from the tree and destroys it and its contents completely.
Example
Be aware that you just can use one option a time, so I commented the others out.
from bs4 import BeautifulSoup
html = '''<h1><div style="display: block; font-size: 0.75rem;">Apple(#34567)</div>〔NEW〕 iPhone12 256GB </h1>'''
soup = BeautifulSoup(html, 'lxml')
for item in soup.select('div.brand'):
item.extract()
#item.decompose()
#item.replace_with('')
soup.h1
Output
<h1>〔NEW〕 iPhone12 256GB </h1>