Home > Enterprise >  How to make dataset from web scaped variables?
How to make dataset from web scaped variables?

Time:02-22

I was trying to scrape a real estate website. The problem is that I can't insert my scaped variables into one dataset. Can anyone help me, please? Thank you!

Here is my code:

html_text1=requests.get('https://www.propertyfinder.ae/en/search?c=1&ob=mr&page=1').content
soup1=BeautifulSoup(html_text1,'lxml')

listings=soup1.find_all('a',class_='card card--clickable')
for listing in listings:
 price=listing.find('p', class_='card__price').text.split()[0]
 price=price.split()[0]
 title=listing.find('h2', class_='card__title card__title-link').text
 property_type=listing.find('p',class_='card__property-amenity card__property-amenity--property-type').text
 bedrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bedrooms').text
 bathrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bathrooms').text
 location=listing.find('p', class_='card__location').text

 dataset=pd.DataFrame({property_type, price, title, bedrooms, bathrooms, location})
 print(dataset)

My output looks like this: enter image description here

However, I want it to look like a DataFrame:

Apartment | 162500 | ...

Townhouse | 162500 | ...

Villa | 7500000 | ...

Villa | 15000000 | ...

CodePudding user response:

The problem with your code is, you are trying to create a dataframe from within the for loop. What you should be doing is creating lists to store these values separately in lists and then creating a df from these lists.

Here's what the code will look like:

price_lst = []
title_lst = []
propertyType_lst = []
bedrooms_lst = []
bathrooms_lst = []
location_lst = []


listings = soup1.find_all('a',class_='card card--clickable')
for listing in listings:
    price = listing.find('p', class_='card__price').text.split()[0]
    price = price.split()[0]
    price_lst.append(price)

    title = listing.find('h2', class_='card__title card__title-link').text
    title_lst.append(title)

    property_type = listing.find('p',class_='card__property-amenity card__property-amenity--property-type').text
    propertyType_lst.append(property_type)
    
    bedrooms = listing.find('p', class_='card__property-amenity card__property-amenity--bedrooms').text
    bedrooms_lst.append(bedrooms)

    bathrooms = listing.find('p', class_='card__property-amenity card__property-amenity--bathrooms').text
    bathrooms_lst.append(bathrooms)

    location = listing.find('p', class_='card__location').text
    location_lst.append(location)

dataset = pd.DataFrame(list(zip(propertyType_lst, price_lst, title_lst, bedrooms_lst, bathrooms_lst, location_lst)), 
                                columns = ['Property Type', 'Price', 'Title', 'Bedrooms', 'Bathrooms', 'Location'])

CodePudding user response:

Would recommend to work with a bit more structur - Use dicts or list of dicts to store the data of your iteration and create a data frame in the end:

data = []

for listing in listings:
    price=listing.find('p', class_='card__price').text.split()[0].split()[0]
    title=listing.find('h2').text
    property_type=listing.find('p',class_='card__property-amenity card__property-amenity--property-type').text
    bedrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bedrooms').text
    bathrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bathrooms').text
    location=listing.find('p', class_='card__location').text
    
    data.append({
        'price':price,
        'title':title,
        'property_type':property_type,
        'bedrooms':bedrooms,
        'bathrooms':bathrooms,
        'location':location
    })

Note: Also check the your selections to avoid AttributeErrors

title=t.text if (t:=listing.find('h2')) else None

Example

from bs4 import BeautifulSoup
import requests
import pandas as pd

html_text1=requests.get('https://www.propertyfinder.ae/en/search?c=1&ob=mr&page=1').content
soup1=BeautifulSoup(html_text1,'lxml')

listings=soup1.find_all('a',class_='card card--clickable')

data = []

for listing in listings:
    price=listing.find('p', class_='card__price').text.split()[0]
    price=price.split()[0]
    title=listing.find('h2').text
    property_type=listing.find('p',class_='card__property-amenity card__property-amenity--property-type').text
    bedrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bedrooms').text
    bathrooms=listing.find('p', class_='card__property-amenity card__property-amenity--bathrooms').text
    location=listing.find('p', class_='card__location').text
    
    data.append({
        'price':price,
        'title':title,
        'property_type':property_type,
        'bedrooms':bedrooms,
        'bathrooms':bathrooms,
        'location':location
    })

dataset=pd.DataFrame(data)

Output

price title property_type bedrooms bathrooms location
0 35,000,000 Fully Upgraded Private Pool Prime Location Villa 6 District One Villas, District One, Mohammed Bin Rashid City, Dubai
1 2,600,000 Vacant Brand New and Ready Community View Villa 3 La Quinta, Villanova, Dubai Land, Dubai
2 8,950,000 Exclusive Newly Renovated Prime Location Villa 4 Jumeirah 3 Villas, Jumeirah 3, Jumeirah, Dubai
3 3,500,000 Brand New Single Row Vastu Compliant Villa 3 Azalea, Arabian Ranches 2, Dubai
4 1,455,000 Limited Units 3 Yrs Payment Plan La Violeta TH Townhouse 3 La Violeta 1, Villanova, Dubai Land, Dubai
  • Related