Home > Net >  bs4 Adding space when adding new tag into another
bs4 Adding space when adding new tag into another

Time:08-30

I'm trying to set strong tag on some text inside p tag. i managed to do this but getting some weird spacing

Working on set design, illustration, graphic design, wardrobe management, prop masters, makeup artists, <strong> special effects supervisors </strong>, and more are some of the responsibilities of this position.

In this example as you can see there is a space inside the strong tag, which making the paragraph look a bit weird with the comma after the space.

my code

                text = el.text
                el.clear()
                match = re.search(r'\b%s\b' % str(
                    keyword), text, re.IGNORECASE)
                start, end = match.start(), match.end()
                el.append(text[:start])
                
                strong_tag = soup.new_tag('strong')
                strong_tag.append(text[start:end])
                el.append(strong_tag)
                
                el.append(text[end:])

Also when saving the html into a file, it's prettified. Is there a way keep it minified ?

After editing the HTML with bs4 I'm doing

return soup.decode('utf-8')

and than saving to html.

the output is like that:

<p>
some text
<strong>strong</strong>
rest of the paragraph
</p>

I would really love to keep it

<p>some text <strong>strong</strong> rest of the paragraph</p>

Hope I find the solution here, Thank's in advance.

CodePudding user response:

Script seems to work, there are no additional spaces and it is not clear, why to .decode('utf-8') - May simply convert your BeautifulSoup object back to a string:

str(soup)    

Example

from bs4 import BeautifulSoup
import re

html = '''<p>some text strong rest of the paragraph</p><p>some text strong rest of the paragraph</p><p>some text strong rest of the paragraph</p>'''

keyword = 'strong'

soup = BeautifulSoup(html, 'html.parser')

for p in soup.select('p'):
    text = p.text
    p.clear()
    match = re.search(r'\b%s\b' % str(
        keyword), text, re.IGNORECASE)
    start, end = match.start(), match.end()
    p.append(text[:start])

    strong_tag = soup.new_tag('strong')
    strong_tag.append(text[start:end])
    p.append(strong_tag)
    p.append(text[end:])

str(soup)

Output

<p>some text <strong>strong</strong> rest of the paragraph</p><p>some text <strong>strong</strong> rest of the paragraph</p><p>some text <strong>strong</strong> rest of the paragraph</p>
  • Related