Home > Enterprise >  Replace 'class' attribute with 'href' in Python
Replace 'class' attribute with 'href' in Python

Time:10-08

I am trying to make some changes to an HTML code via a python script I am writing. I have been struggling to do a simple replacement the last few days, without any success.

<a >1</a> -----> <a href="#PageNo1">1</a>

<a >2</a> -----> <a href="#PageNo2">2</a>

<a >12</a> -----> <a href="#PageNo12">12</a>

<a >20</a> -----> <a href="#PageNo20">20</a>

I simply can't replace the "a class" with the "a href". I've tried something like that html_content = html_content.replace("a class", "a href") or to do the replacement via BeautifulSoup but with no success and I couldn't find anything similar on StackOverflow as well.

Any ideas?

CodePudding user response:

Here is a solution:

from bs4 import BeautifulSoup

s = """
<a >1</a>
<a >2</a>
<div>
    <a >25</a>
</div>
"""

soup = BeautifulSoup(s, 'html.parser')

for a in soup.select("a"):
    content = a.contents[0]
    del a.attrs['class']
    a.attrs['href'] = f"#PageNo{content}"

Output:

<a href="#PageNo1">1</a>
<a href="#PageNo2">2</a>
<div>
    <a href="#PageNo25">25</a>
</div>
  • Related