I am trying to build a real estate web scraping script and I am stuck on how to get the href
from this piece of HTML code with BeautifulSoup from this site
HTML of the Targeted Element:
<a data-v-0354ca3a="" href="/kopa-bostad/objekt/4MDBKG9311MD8M25" class="card d-flex flex-column mb-8 v-card v-card--link v-sheet theme--light" tabindex="0" style="width: 100%;">
Any suggestions?
Thanks!
CodePudding user response:
Try this
from bs4 import BeautifulSoup
html = '<a data-v-0354ca3a="" href="/kopa-bostad/objekt/4MDBKG9311MD8M25" class="card d-flex flex-column mb-8 v-card v-card--link v-sheet theme--light" tabindex="0" style="width: 100%;">'
soup = BeautifulSoup(html)
soup.a['href']
Output:
'/kopa-bostad/objekt/4MDBKG9311MD8M25'
CodePudding user response:
BeautifulSoup treats each HTML tag as dictionary, you can get the href
of the <a>
tag like this,
from bs4 import BeautifulSoup
html = '<a data-v-0354ca3a="" href="/kopa-bostad/objekt/4MDBKG9311MD8M25" class="card d-flex flex-column mb-8 v-card v-card--link v-sheet theme--light" tabindex="0" style="width: 100%;">'
soup = BeautifulSoup(html, 'html.parser')
for i in soup.find_all('a'):
print(i['href'])
Outputs:
/kopa-bostad/objekt/4MDBKG9311MD8M25
And here, i
in the for loop is the variable_name_that_has_the_tag_element
If you want to get only the HTML that you mentioned in the question from a full HTML Page Source, use class_
parameter in the find_all
method, or use a dictionary to mention the HTML tag's classname, like
for i in soup.find_all('a', class_='card d-flex flex-column mb-8 v-card v-card--link v-sheet theme--light'):
print(i['href'])
Or
for i in soup.find_all('a', {'class': 'card d-flex flex-column mb-8 v-card v-card--link v-sheet theme--light'}):
print(i['href'])
Both the code still outputs,
/kopa-bostad/objekt/4MDBKG9311MD8M25
And i hope that explanation has cleared all your doubts../
tell me if this is okay for you...