I'm trying to set strong tag on some text inside p tag. i managed to do this but getting some weird spacing
Working on set design, illustration, graphic design, wardrobe management, prop masters, makeup artists, <strong> special effects supervisors </strong>, and more are some of the responsibilities of this position.
In this example as you can see there is a space inside the strong tag, which making the paragraph look a bit weird with the comma after the space.
my code
text = el.text
el.clear()
match = re.search(r'\b%s\b' % str(
keyword), text, re.IGNORECASE)
start, end = match.start(), match.end()
el.append(text[:start])
strong_tag = soup.new_tag('strong')
strong_tag.append(text[start:end])
el.append(strong_tag)
el.append(text[end:])
Also when saving the html into a file, it's prettified. Is there a way keep it minified ?
After editing the HTML with bs4 I'm doing
return soup.decode('utf-8')
and than saving to html.
the output is like that:
<p>
some text
<strong>strong</strong>
rest of the paragraph
</p>
I would really love to keep it
<p>some text <strong>strong</strong> rest of the paragraph</p>
Hope I find the solution here, Thank's in advance.
CodePudding user response:
Script seems to work, there are no additional spaces and it is not clear, why to .decode('utf-8')
- May simply convert your BeautifulSoup
object back to a string:
str(soup)
Example
from bs4 import BeautifulSoup
import re
html = '''<p>some text strong rest of the paragraph</p><p>some text strong rest of the paragraph</p><p>some text strong rest of the paragraph</p>'''
keyword = 'strong'
soup = BeautifulSoup(html, 'html.parser')
for p in soup.select('p'):
text = p.text
p.clear()
match = re.search(r'\b%s\b' % str(
keyword), text, re.IGNORECASE)
start, end = match.start(), match.end()
p.append(text[:start])
strong_tag = soup.new_tag('strong')
strong_tag.append(text[start:end])
p.append(strong_tag)
p.append(text[end:])
str(soup)
Output
<p>some text <strong>strong</strong> rest of the paragraph</p><p>some text <strong>strong</strong> rest of the paragraph</p><p>some text <strong>strong</strong> rest of the paragraph</p>