How to extract address from html file-CodePudding

I am new to the community. I am working on a project for determining the address from an html file. The specific string that I am trying to process is

<address class="list-card-addr">1867 Central Ave, Augusta, GA 30904</address>

I have tried processing it using manual tools. I'd like to use python to process the entire html file. Can someone explain how to do this in python? Thank you in advance.

CodePudding user response：

Use Regex to find the addresses....

r1 = re.findall(r"<address class=\"?list-card-addr\"?>([^<] )", html)
print(r1)

CodePudding user response：

You can extract the address using BeautifulSoup, which is very handy for accessing elements in HTML and XML documents.

from bs4 import BeautifulSoup
import requests

r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, "html.parser")
addr = soup.find("address", class_="list-card-addr")
print(addr.text)