Home > Blockchain >  Web Scraping using Beautifulsoup get text from <span> inside <div>
Web Scraping using Beautifulsoup get text from <span> inside <div>

Time:11-04

I have a code that scrapes real estate data. Part of the code is the following:

for sqm in soup.find('ul', {'class': 'list-view real-estates'}).find_all('div', {'class': 'inline-group'}):
    sqm_value = sqm.get_text()
    sqm_area.append(sqm_value)

So far I get a big chunk of the code, but I am only interested in the area. The problem with that is that sometimes in the description where the sq.m. value is there are either 1 or 2 commas beforehand.

https://i.stack.imgur.com/lFt3c.png

This is the link to the site from where I try to scrape the data: https://www.imoti.net/bg/obiavi/r/prodava/bulgaria/?page=1&sid=iXMpXe

I am looking for this value only (I can remove the 'м' from the string).

https://i.stack.imgur.com/uRbOf.png

Any ideas on how I can extend my code to get the sq.m. value?

CodePudding user response:

Just select 2nd <span> in that's in <h3> which is in <div> with class inline-group using select('h3 > span:nth-child(2)').

CodePudding user response:

Simply split text on , to get list. And use [-1] to get last element from this list

sqm_value.split(",")[-1]
  • Related