Home > Software engineering >  Is there a way to replace <p> and </p> tags with newlines using Python and BeautifulSoup
Is there a way to replace <p> and </p> tags with newlines using Python and BeautifulSoup

Time:01-10

I have loaded webpage with BeautifulSoup and I want to remove those p tags (everyting between < >), and replace p tag only with newlines so it's neatly formatted.

from bs4 import BeautifulSoup

with open('nosleep2.html', encoding='utf-8') as webpage:
    soup = BeautifulSoup(webpage, 'lxml')

    post = soup.find('div', class_ = 'RichTextJSON-root')

    print(post)

This is the part of the output

<p >I’m using this post to chronicle the events of this evening, as they have been truly fascinating.</p><p >Some quick backstory: the small section of the city that me and my friends live is generally known to be haunted. And even if you’re not a believer in the supernatural… it’s at least a bit eerie. I’m not going to dox myself by stating the location, but let’s just say that “creepy sightings” and murders/deaths due to
unexplainable circumstances are our bread and butter.</p>

It should look like this:

I’m using this post to chronicle the events of this evening, as they have been truly fascinating.
Some quick backstory: the small section of the city that me and my friends live is generally known to be haunted. And even if you’re not a believer in the supernatural… it’s at least a bit eerie. I’m not going to dox myself by stating the location, but let’s just say that “creepy sightings” and murders/deaths due to
unexplainable circumstances are our bread and butter.

I've tried .get_text() but that only outputs text which has no newlines.

CodePudding user response:

Use get_text() with its separator parameter:

soup.get_text('\n')

Example

from bs4 import BeautifulSoup
html = '''<p >I’m using this post to chronicle the events of this evening, as they have been truly fascinating.</p><p >Some quick backstory: the small section of the city that me and my friends live is generally known to be haunted. And even if you’re not a believer in the supernatural… it’s at least a bit eerie. I’m not going to dox myself by stating the location, but let’s just say that “creepy sightings” and murders/deaths due to
unexplainable circumstances are our bread and butter.</p>'''
soup = BeautifulSoup(html)

soup.get_text('\n')

Output

I’m using this post to chronicle the events of this evening, as they have been truly fascinating.\nSome quick backstory: the small section of the city that me and my friends live is generally known to be haunted. And even if you’re not a believer in the supernatural… it’s at least a bit eerie. I’m not going to dox myself by stating the location, but let’s just say that “creepy sightings” and murders/deaths due to\nunexplainable circumstances are our bread and butter.
  • Related