Home > other >  How to extract text from html in Python with BeautifulSoup4
How to extract text from html in Python with BeautifulSoup4

Time:04-22

I am trying to extract its text i.e only the filename from the below html tags

So in the end I would like to have output as below-

BeforeStructure.PNG
AfterStructure.PNG

Can you please guide how to I extract only the file name from the below code, Thanks!

<div><br> </div><div><img src=\"https://azure.com/8dd91aab-0dce-41e2-95d4-eb69e06c68fe/_apis/wit/attachments/c4e5aab1-0877-44ea-ad2d-2c614ac56984?fileName=BeforeStructure.PNG\" alt=BeforeStructure.PNG><br> </div><div><br> </div><div><img src=\"https://azure.com/8dd91aab-0dce-41e2-95d4-eb69e06c68fe/_apis/wit/attachments/af842f67-1a3d-48dc-8c8b-396a28b306ce?fileName=AfterStructure.PNG\" alt=AfterStructure.PNG><br> </div>"

CodePudding user response:

Instead of trying to extract the file name from the link in the src attribute, you can just extract the name in the alt attribute.

for img in soup.find_all("img"):
  print(img.get("alt"))

Output -

BeforeStructure.PNG
AfterStructure.PNG

CodePudding user response:

File_name = soup.find(‘file name’).getText()

This works for 1 find, if you want to find in multiple locations then use find all

  • Related