I am trying to make some changes to an HTML code via a python script I am writing. I have been struggling to do a simple replacement the last few days, without any success.
<a >1</a>
-----> <a href="#PageNo1">1</a>
<a >2</a>
-----> <a href="#PageNo2">2</a>
<a >12</a>
-----> <a href="#PageNo12">12</a>
<a >20</a>
-----> <a href="#PageNo20">20</a>
I simply can't replace the "a class" with the "a href". I've tried something like that
html_content = html_content.replace("a class", "a href")
or to do the replacement via BeautifulSoup but with no success and I couldn't find anything similar on StackOverflow as well.
Any ideas?
CodePudding user response:
Here is a solution:
from bs4 import BeautifulSoup
s = """
<a >1</a>
<a >2</a>
<div>
<a >25</a>
</div>
"""
soup = BeautifulSoup(s, 'html.parser')
for a in soup.select("a"):
content = a.contents[0]
del a.attrs['class']
a.attrs['href'] = f"#PageNo{content}"
Output:
<a href="#PageNo1">1</a>
<a href="#PageNo2">2</a>
<div>
<a href="#PageNo25">25</a>
</div>