I have loaded webpage with BeautifulSoup and I want to remove those p tags (everyting between < >), and replace p tag only with newlines so it's neatly formatted.
from bs4 import BeautifulSoup
with open('nosleep2.html', encoding='utf-8') as webpage:
soup = BeautifulSoup(webpage, 'lxml')
post = soup.find('div', class_ = 'RichTextJSON-root')
print(post)
This is the part of the output
<p >I’m using this post to chronicle the events of this evening, as they have been truly fascinating.</p><p >Some quick backstory: the small section of the city that me and my friends live is generally known to be haunted. And even if you’re not a believer in the supernatural… it’s at least a bit eerie. I’m not going to dox myself by stating the location, but let’s just say that “creepy sightings” and murders/deaths due to
unexplainable circumstances are our bread and butter.</p>
It should look like this:
I’m using this post to chronicle the events of this evening, as they have been truly fascinating.
Some quick backstory: the small section of the city that me and my friends live is generally known to be haunted. And even if you’re not a believer in the supernatural… it’s at least a bit eerie. I’m not going to dox myself by stating the location, but let’s just say that “creepy sightings” and murders/deaths due to
unexplainable circumstances are our bread and butter.
I've tried .get_text()
but that only outputs text which has no newlines.
CodePudding user response:
Use get_text()
with its separator parameter:
soup.get_text('\n')
Example
from bs4 import BeautifulSoup
html = '''<p >I’m using this post to chronicle the events of this evening, as they have been truly fascinating.</p><p >Some quick backstory: the small section of the city that me and my friends live is generally known to be haunted. And even if you’re not a believer in the supernatural… it’s at least a bit eerie. I’m not going to dox myself by stating the location, but let’s just say that “creepy sightings” and murders/deaths due to
unexplainable circumstances are our bread and butter.</p>'''
soup = BeautifulSoup(html)
soup.get_text('\n')
Output
I’m using this post to chronicle the events of this evening, as they have been truly fascinating.\nSome quick backstory: the small section of the city that me and my friends live is generally known to be haunted. And even if you’re not a believer in the supernatural… it’s at least a bit eerie. I’m not going to dox myself by stating the location, but let’s just say that “creepy sightings” and murders/deaths due to\nunexplainable circumstances are our bread and butter.