Home > other >  How to wrap initial of each word in a specific tag with a <b>?
How to wrap initial of each word in a specific tag with a <b>?

Time:05-29

I am trying to use the BeautifulSoup module with Python to do the following:

Within a div for HTML, for each paragraph tag, I want to add a bold tag to the first letter of each word within the paragraph. For example:

<div >
    <p>The quick brown fox</p>
</div>

which would read: The quick brown fox

would then become

<div >
    <p><b>T</b>he <b>q</b>uick <b>b</b>rown <b>f</b>ox</p>
</div>

that would read: The quick brown fox

Using bs4 i've been unable to find a good solution to do this and am open to ideas.

CodePudding user response:

You could use replace_with() combined with list comprehension - Extract text / string from tag / bs4 object, process it as text and later on replace the tag with new bs4 object:

soup.p.replace_with(
    BeautifulSoup(
        ' '.join([s.replace(s[0],f'<b>{s[0]}</b>') for s in soup.p.string.split(' ')]),'html.parser'
    )
)

Example

from bs4 import BeautifulSoup
html = '''
<div >
    <p>The quick brown fox</p>
</div>'''
soup = BeautifulSoup(html,'html.parser')

soup.p.replace_with(
    BeautifulSoup(
        ' '.join([s.replace(s[0],f'<b>{s[0]}</b>') for s in soup.p.string.split(' ')]),'html.parser'
    )
)

soup
Output
<div >
<b>T</b>he <b>q</b>uick <b>b</b>rown <b>f</b>ox
</div>

CodePudding user response:

I don't know much about how Python parses HTML in detail, but I can provide you with some ideas.

To find <p> tags, you can use RegEx <p.*?>.*?</p> or use str.find("<p>") and walk until </p>.

To add <b> tags, perhaps this code will work:

def add_bold(s: str) -> str:
    ret = ""
    isFirstLet = True
    for i in s:
        if isFirstLet:
            ret  = "<b>"   i   "</b>"
            isFirstLet = False
        else:
            ret  = i
        if i == " ": isFirstLet = True
    return ret
  • Related