How to remove links from tags in html?-CodePudding

I'm writing scraper in Python with bs4 and want to remove links from all 'a' tags

I have html code

html_code = '<a href="link">some text</a>'

I want to remove href="link" and get only

html_code = '<a>some text</a>'

How can i do it?

CodePudding user response：

I would do it following way

from bs4 import BeautifulSoup
html_code = '<a href="link">some text</a>'
soup = BeautifulSoup(html_code)
print("Before")
print(soup.prettify())
for node in soup.find_all("a"):
    node.attrs = {}
print("After")
print(soup.prettify())

gives output

Before
<html>
 <body>
  <a href="link">
   some text
  </a>
 </body>
</html>
After
<html>
 <body>
  <a>
   some text
  </a>
 </body>
</html>

Note that this will remove all attributes of all <a> tags.

CodePudding user response：

Does this solve your problem?

html_code = html_code.replace(' href="link"','')

Output:

>>> print(html_code)

>>> '<a>some text</a>'

CodePudding user response：

Try:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<a href="link">some text</a>', "html.parser")

del soup.a.attrs
print(soup.a)

Prints:

<a>some text</a>