I have a html snippet (no other parent elements):
html = '<div id="mydiv"><p>Hello</p><p>Goodbye</p>[...]</div>'
How do I extract all the tags and text (which may be variable) within the div
, but not the div tag itself? I.e.L
target_str = '<p>Hello</p><p>Goodbye</p>[...]'
I have tried:
soup = BeautifulSoup(html , 'html.parser')
mydiv = soup.find(id='mydiv')
print(mydiv)
>>> '<div id="mydiv"><p>Hello</p><p>Goodbye</p>[...]</div>'
mydiv.unwrap()
print(mydiv)
>>> '<div id="mydiv"></div>'
How do I get just the contents of the tag?
CodePudding user response:
Try:
from bs4 import BeautifulSoup
html = '<div id="mydiv"><p>Hello</p><p>Goodbye</p>[...]</div>'
soup = BeautifulSoup(html, "html.parser")
print("".join(map(str, soup.select_one("#mydiv").contents)))
Prints:
<p>Hello</p><p>Goodbye</p>[...]