I have gathered the data in a list which needs to be removed , the below code shows the list :
keyword= "www.indigo.com"
hrefs = [links['href'] for links in getDetails.find_all('a', href=True) if target in links['href']]
print(hrefs)
It prints the following output :
['https://www.indigo.com/registration.html']
[]
['https://www.indigo.com/buservfcl.html', 'https://www.indigo.com/2021/07/agents.html']
getDetails
has the complete page source code
Now, how do I compare
getDetails
with thehrefs
list and remove/decompose every items that is present in the list.
I tried this , but it doesnt work for some reason :
hrefs = [links['href'] for links in getDetails.find_all('a', href=True) if target in links['href']]
print(hrefs)
for z in hrefs:
getDetails.decompose()
It removed the entire data in the getDescription, but i need to remove only the elements which are in the list and not evrything
The output should be the complete HTML except the ones that has www.indigo.com in it
CodePudding user response:
You have to find parent
tag and then use decompose()
method
html="""<div><a href="www.indigo.com"></div>"""
soup=BeautifulSoup(html,"html.parser")
target= "www.indigo.com"
href_tags = [links for links in soup.find_all('a', href=True) if target in links['href']]
for i in href_tags:
i.parent.decompose()
Output:
soup
will be empty
From the URL:
import requests
res=requests.get("https://www.assamcareer.com/2021/06/oil-india-limited.html")
soup=BeautifulSoup(res.text,"html.parser")
target= "www.assamcareer.com"
tags = [links for links in soup.find_all('a', href=True) if target in links['href']]
for i in tags:
i.parent.decompose()
Updated Answer:
for title in root:
/
Your code
/
href_tags = [links for links in getDetails.find_all('a',href=True) if target in links['href']]
print(href_tags)
for i in href_tags:
i.parent.decompose()