I am trying to use the beautiful soup module with Python to do the following:
Within a div for HTML, for each paragraph tag, I want to add a bold tag to the first letter of each word within the paragraph. For example:
<div >
<p>The quick brown fox</p>
</div>
which would read: The quick brown fox
would then become
<div >
<p><b>T</b>he <b>q</b>uick <b>b</b>rown fox</p>
</div>
that would read: The quick brown fox
Using bs4 i've been unable to find a good solution to do this and am open to ideas.
CodePudding user response:
I don't know much about how Python parses HTML in detail, but I can provide you with some ideas.
To find <p>
tags, you can use RegEx <p.*?>.*?</p>
or use str.find("<p>")
and walk until </p>
.
To add <b>
tags, perhaps this code will work:
def add_bold(s: str) -> str:
ret = ""
isFirstLet = True
for i in s:
if isFirstLet:
ret = "<b>" i "</b>"
isFirstLet = False
else:
ret = i
if i == " ": isFirstLet = True
return ret
CodePudding user response:
You could use replace_with()
combined with list comprehension
:
soup.p.replace_with(
BeautifulSoup(
''.join([s.replace(s[0],f'<b>{s[0]}</b>') for s in soup.p.string.split()]),'html.parser'
)
)
Example
from bs4 import BeautifulSoup
html = '''
<div >
<p>The quick brown fox</p>
</div>'''
soup = BeautifulSoup(html,'html.parser')
soup.p.replace_with(
BeautifulSoup(
''.join([s.replace(s[0],f'<b>{s[0]}</b>') for s in soup.p.string.split()]),'html.parser'
)
)
soup
Output
<div >
<b>T</b>he<b>q</b>uick<b>b</b>rown<b>f</b>ox
</div>