Web Scraping: How to extract href from a HTML piece?-CodePudding

I am trying to build a real estate web scraping script and I am stuck on how to get the href from this piece of HTML code with BeautifulSoup from this site

HTML of the Targeted Element:

<a data-v-0354ca3a="" href="/kopa-bostad/objekt/4MDBKG9311MD8M25" class="card d-flex flex-column mb-8 v-card v-card--link v-sheet theme--light" tabindex="0" style="width: 100%;">

Any suggestions?

Thanks!

CodePudding user response：

Try this

from bs4 import BeautifulSoup
html = '<a data-v-0354ca3a="" href="/kopa-bostad/objekt/4MDBKG9311MD8M25" class="card d-flex flex-column mb-8 v-card v-card--link v-sheet theme--light" tabindex="0" style="width: 100%;">'
soup = BeautifulSoup(html)
soup.a['href']

Output:

'/kopa-bostad/objekt/4MDBKG9311MD8M25'

CodePudding user response：

BeautifulSoup treats each HTML tag as dictionary, you can get the href of the <a> tag like this,

from bs4 import BeautifulSoup

html = '<a data-v-0354ca3a="" href="/kopa-bostad/objekt/4MDBKG9311MD8M25" class="card d-flex flex-column mb-8 v-card v-card--link v-sheet theme--light" tabindex="0" style="width: 100%;">'
soup = BeautifulSoup(html, 'html.parser')
for i in soup.find_all('a'):
    print(i['href'])

Outputs:

/kopa-bostad/objekt/4MDBKG9311MD8M25

And here, i in the for loop is the variable_name_that_has_the_tag_element

If you want to get only the HTML that you mentioned in the question from a full HTML Page Source, use class_ parameter in the find_all method, or use a dictionary to mention the HTML tag's classname, like

for i in soup.find_all('a', class_='card d-flex flex-column mb-8 v-card v-card--link v-sheet theme--light'):
    print(i['href'])

for i in soup.find_all('a', {'class': 'card d-flex flex-column mb-8 v-card v-card--link v-sheet theme--light'}):
  print(i['href'])

Both the code still outputs,

/kopa-bostad/objekt/4MDBKG9311MD8M25

And i hope that explanation has cleared all your doubts../

tell me if this is okay for you...