I have a code that scrapes real estate data. Part of the code is the following:
for sqm in soup.find('ul', {'class': 'list-view real-estates'}).find_all('div', {'class': 'inline-group'}):
sqm_value = sqm.get_text()
sqm_area.append(sqm_value)
So far I get a big chunk of the code, but I am only interested in the area. The problem with that is that sometimes in the description where the sq.m. value is there are either 1 or 2 commas beforehand.
https://i.stack.imgur.com/lFt3c.png
This is the link to the site from where I try to scrape the data: https://www.imoti.net/bg/obiavi/r/prodava/bulgaria/?page=1&sid=iXMpXe
I am looking for this value only (I can remove the 'м' from the string).
https://i.stack.imgur.com/uRbOf.png
Any ideas on how I can extend my code to get the sq.m. value?
CodePudding user response:
Just select 2nd <span>
in that's in <h3>
which is in <div>
with class inline-group
using select('h3 > span:nth-child(2)')
.
CodePudding user response:
Simply split
text on ,
to get list. And use [-1]
to get last element from this list
sqm_value.split(",")[-1]