Home > database >  How to replace tag in html with bs4
How to replace tag in html with bs4

Time:05-09

i have html file with 2 a tags, link and text, i managed to replace the link but I don't know how to replace the text inside the tag. I do not really know how tags change, I would like to understand

my code:

import requests 
from bs4 import BeautifulSoup 

link = 'http://127.0.0.1:5500/dat.html'
response = requests.get(link).text

with open('parse.html', 'w', encoding= 'utf-8') as file:
    file.write(response)

soup = BeautifulSoup(response, 'lxml')

res = response.replace("https://www.google.com/", "https://reddit.com/")



with open("parse.html", "w") as outf:
    outf.write(res)

html:

<body>
<h1>
    <a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
    <a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>

i need

 <body>
<h1>
    <a href="https://https://www.reddit.com//" target="_blank">reddit</a>
</h1>
<h1>
    <a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>

CodePudding user response:

You can find all relevant <a> tags and change their attributes/.string:

from bs4 import BeautifulSoup

html_doc = """
<body>
<h1>
    <a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
    <a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for a in soup.select('a[href*="google.com"]'):
    a["href"] = "https://reddit.com/"
    a.string = "reddit"

print(soup.prettify())

Prints:

<body>
 <h1>
  <a href="https://reddit.com/" target="_blank">
   reddit
  </a>
 </h1>
 <h1>
  <a href="https://ru.wikipedia.org/wiki/" target="_blank">
   wiki
  </a>
 </h1>
</body>
  • Related