I'm writing scraper in Python with bs4 and want to remove links from all 'a' tags
I have html code
html_code = '<a href="link">some text</a>'
I want to remove href="link" and get only
html_code = '<a>some text</a>'
How can i do it?
CodePudding user response:
I would do it following way
from bs4 import BeautifulSoup
html_code = '<a href="link">some text</a>'
soup = BeautifulSoup(html_code)
print("Before")
print(soup.prettify())
for node in soup.find_all("a"):
node.attrs = {}
print("After")
print(soup.prettify())
gives output
Before
<html>
<body>
<a href="link">
some text
</a>
</body>
</html>
After
<html>
<body>
<a>
some text
</a>
</body>
</html>
Note that this will remove all attributes of all <a>
tags.
CodePudding user response:
Does this solve your problem?
html_code = html_code.replace(' href="link"','')
Output:
>>> print(html_code)
>>> '<a>some text</a>'
CodePudding user response:
Try:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<a href="link">some text</a>', "html.parser")
del soup.a.attrs
print(soup.a)
Prints:
<a>some text</a>