I was trying to scrape a real estate website. The problem is that I can't insert my scaped variables into one dataset. Can anyone help me, please? Thank you!
Here is my code:
html_text1=requests.get('https://www.propertyfinder.ae/en/search?c=1&ob=mr&page=1').content
soup1=BeautifulSoup(html_text1,'lxml')
listings=soup1.find_all('a',class_='card card--clickable')
for listing in listings:
price=listing.find('p', class_='card__price').text.split()[0]
price=price.split()[0]
title=listing.find('h2', class_='card__title card__title-link').text
property_type=listing.find('p',class_='card__property-amenity card__property-amenity--property-type').text
bedrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bedrooms').text
bathrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bathrooms').text
location=listing.find('p', class_='card__location').text
dataset=pd.DataFrame({property_type, price, title, bedrooms, bathrooms, location})
print(dataset)
My output looks like this: enter image description here
However, I want it to look like a DataFrame:
Apartment | 162500 | ...
Townhouse | 162500 | ...
Villa | 7500000 | ...
Villa | 15000000 | ...
CodePudding user response:
The problem with your code is, you are trying to create a dataframe
from within the for
loop. What you should be doing is creating lists to store these values separately in lists
and then creating a df
from these lists
.
Here's what the code will look like:
price_lst = []
title_lst = []
propertyType_lst = []
bedrooms_lst = []
bathrooms_lst = []
location_lst = []
listings = soup1.find_all('a',class_='card card--clickable')
for listing in listings:
price = listing.find('p', class_='card__price').text.split()[0]
price = price.split()[0]
price_lst.append(price)
title = listing.find('h2', class_='card__title card__title-link').text
title_lst.append(title)
property_type = listing.find('p',class_='card__property-amenity card__property-amenity--property-type').text
propertyType_lst.append(property_type)
bedrooms = listing.find('p', class_='card__property-amenity card__property-amenity--bedrooms').text
bedrooms_lst.append(bedrooms)
bathrooms = listing.find('p', class_='card__property-amenity card__property-amenity--bathrooms').text
bathrooms_lst.append(bathrooms)
location = listing.find('p', class_='card__location').text
location_lst.append(location)
dataset = pd.DataFrame(list(zip(propertyType_lst, price_lst, title_lst, bedrooms_lst, bathrooms_lst, location_lst)),
columns = ['Property Type', 'Price', 'Title', 'Bedrooms', 'Bathrooms', 'Location'])
CodePudding user response:
Would recommend to work with a bit more structur - Use dicts
or list of dicts to store the data of your iteration and create a data frame in the end:
data = []
for listing in listings:
price=listing.find('p', class_='card__price').text.split()[0].split()[0]
title=listing.find('h2').text
property_type=listing.find('p',class_='card__property-amenity card__property-amenity--property-type').text
bedrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bedrooms').text
bathrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bathrooms').text
location=listing.find('p', class_='card__location').text
data.append({
'price':price,
'title':title,
'property_type':property_type,
'bedrooms':bedrooms,
'bathrooms':bathrooms,
'location':location
})
Note: Also check the your selections to avoid AttributeErrors
title=t.text if (t:=listing.find('h2')) else None
Example
from bs4 import BeautifulSoup
import requests
import pandas as pd
html_text1=requests.get('https://www.propertyfinder.ae/en/search?c=1&ob=mr&page=1').content
soup1=BeautifulSoup(html_text1,'lxml')
listings=soup1.find_all('a',class_='card card--clickable')
data = []
for listing in listings:
price=listing.find('p', class_='card__price').text.split()[0]
price=price.split()[0]
title=listing.find('h2').text
property_type=listing.find('p',class_='card__property-amenity card__property-amenity--property-type').text
bedrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bedrooms').text
bathrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bathrooms').text
location=listing.find('p', class_='card__location').text
data.append({
'price':price,
'title':title,
'property_type':property_type,
'bedrooms':bedrooms,
'bathrooms':bathrooms,
'location':location
})
dataset=pd.DataFrame(data)
Output
price | title | property_type | bedrooms | bathrooms | location | ||
---|---|---|---|---|---|---|---|
0 | 35,000,000 | Fully Upgraded | Private Pool | Prime Location | Villa | 6 | District One Villas, District One, Mohammed Bin Rashid City, Dubai |
1 | 2,600,000 | Vacant | Brand New and Ready | Community View | Villa | 3 | La Quinta, Villanova, Dubai Land, Dubai |
2 | 8,950,000 | Exclusive | Newly Renovated | Prime Location | Villa | 4 | Jumeirah 3 Villas, Jumeirah 3, Jumeirah, Dubai |
3 | 3,500,000 | Brand New | Single Row | Vastu Compliant | Villa | 3 | Azalea, Arabian Ranches 2, Dubai |
4 | 1,455,000 | Limited Units | 3 Yrs Payment Plan | La Violeta TH | Townhouse | 3 | La Violeta 1, Villanova, Dubai Land, Dubai |