i have html file with 2 a tags, link and text, i managed to replace the link but I don't know how to replace the text inside the tag. I do not really know how tags change, I would like to understand
my code:
import requests
from bs4 import BeautifulSoup
link = 'http://127.0.0.1:5500/dat.html'
response = requests.get(link).text
with open('parse.html', 'w', encoding= 'utf-8') as file:
file.write(response)
soup = BeautifulSoup(response, 'lxml')
res = response.replace("https://www.google.com/", "https://reddit.com/")
with open("parse.html", "w") as outf:
outf.write(res)
html:
<body>
<h1>
<a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
i need
<body>
<h1>
<a href="https://https://www.reddit.com//" target="_blank">reddit</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
CodePudding user response:
You can find all relevant <a>
tags and change their attributes/.string
:
from bs4 import BeautifulSoup
html_doc = """
<body>
<h1>
<a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for a in soup.select('a[href*="google.com"]'):
a["href"] = "https://reddit.com/"
a.string = "reddit"
print(soup.prettify())
Prints:
<body>
<h1>
<a href="https://reddit.com/" target="_blank">
reddit
</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">
wiki
</a>
</h1>
</body>