How to wrap initial of each word in a specific tag with a ?-CodePudding

I am trying to use the BeautifulSoup module with Python to do the following:

Within a div for HTML, for each paragraph tag, I want to add a bold tag to the first letter of each word within the paragraph. For example:

<div >
    <p>The quick brown fox</p>
</div>

which would read: The quick brown fox

would then become

<div >
    <p><b>T</b>he <b>q</b>uick <b>b</b>rown <b>f</b>ox</p>
</div>

that would read: The quick brown fox

Using bs4 i've been unable to find a good solution to do this and am open to ideas.

CodePudding user response：

You could use replace_with() combined with list comprehension - Extract text / string from tag / bs4 object, process it as text and later on replace the tag with new bs4 object:

soup.p.replace_with(
    BeautifulSoup(
        ' '.join([s.replace(s[0],f'<b>{s[0]}</b>') for s in soup.p.string.split(' ')]),'html.parser'
    )
)

Example

from bs4 import BeautifulSoup
html = '''
<div >
    <p>The quick brown fox</p>
</div>'''
soup = BeautifulSoup(html,'html.parser')

soup.p.replace_with(
    BeautifulSoup(
        ' '.join([s.replace(s[0],f'<b>{s[0]}</b>') for s in soup.p.string.split(' ')]),'html.parser'
    )
)

soup

Output

<div >
<b>T</b>he <b>q</b>uick <b>b</b>rown <b>f</b>ox
</div>

CodePudding user response：

I don't know much about how Python parses HTML in detail, but I can provide you with some ideas.

To find  tags, you can use RegEx <p.*?>.*? or use str.find("") and walk until .

To add  tags, perhaps this code will work:

def add_bold(s: str) -> str:
    ret = ""
    isFirstLet = True
    for i in s:
        if isFirstLet:
            ret  = "<b>"   i   "</b>"
            isFirstLet = False
        else:
            ret  = i
        if i == " ": isFirstLet = True
    return ret