Home > Net >  Add tag to each word within a paragraph Python
Add tag to each word within a paragraph Python

Time:05-28

I am trying to use the beautiful soup module with Python to do the following:

Within a div for HTML, for each paragraph tag, I want to add a bold tag to the first letter of each word within the paragraph. For example:

<div >
    <p>The quick brown fox</p>
</div>

which would read: The quick brown fox

would then become

<div >
    <p><b>T</b>he <b>q</b>uick <b>b</b>rown fox</p>
</div>

that would read: The quick brown fox

Using bs4 i've been unable to find a good solution to do this and am open to ideas.

CodePudding user response:

I don't know much about how Python parses HTML in detail, but I can provide you with some ideas.

To find <p> tags, you can use RegEx <p.*?>.*?</p> or use str.find("<p>") and walk until </p>.

To add <b> tags, perhaps this code will work:

def add_bold(s: str) -> str:
    ret = ""
    isFirstLet = True
    for i in s:
        if isFirstLet:
            ret  = "<b>"   i   "</b>"
            isFirstLet = False
        else:
            ret  = i
        if i == " ": isFirstLet = True
    return ret

CodePudding user response:

You could use replace_with() combined with list comprehension:

soup.p.replace_with(
        BeautifulSoup(
            ''.join([s.replace(s[0],f'<b>{s[0]}</b>') for s in soup.p.string.split()]),'html.parser'
        )
    )

Example

from bs4 import BeautifulSoup
html = '''
<div >
    <p>The quick brown fox</p>
</div>'''
soup = BeautifulSoup(html,'html.parser')

soup.p.replace_with(
    BeautifulSoup(
        ''.join([s.replace(s[0],f'<b>{s[0]}</b>') for s in soup.p.string.split()]),'html.parser'
    )
)

soup
Output
<div >
<b>T</b>he<b>q</b>uick<b>b</b>rown<b>f</b>ox
</div>
  • Related