https://www.ejendomstorvet.dk/ledigelokaler/koebenhavn-by/detailhandel-butik
I tried to scrape this real estate link. It shows it has 241 results, but I tried several times it can only scrape out 12 results.
from bs4 import BeautifulSoup
import requests
from csv import writer
url = "https://www.ejendomstorvet.dk/ledigelokaler/detailhandel-butik"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('div', class_ ="propcontainer")
with open('ejendomstorvet_butik_03-140322.csv', 'w', encoding='utf8', newline='') as f:
thewriter = writer(f)
header = ['title','address_01','address_02','size', 'price','link']
thewriter.writerow(header)
for e in lists:
title = e.find('div', class_="prop__intro").text.replace('\r\n','')
address_01 = e.find('div', class_="prop__address").text.replace('\r\n','')
address_02 = e.find('div', class_="prop__address2").text.replace('\r\n','')
size = e.find('span', class_="prop__size").text.replace('\r\n','')
price = e.find('span', class_="prop__price").text.replace('\r\n','')
link = e.find('a' , href=True)
info = [title,address_01,address_02,size,price,link]
thewriter.writerow(info)
CodePudding user response:
Actually full data is populated dynamically by JavaScript from api calls json response. Here is a working example how to collect all results.
Script:
import requests
import json
import pandas as pd
cookies={'Cookie': 'ASP.NET_SessionId=unma1hbvnwdd53w0iqfh5mfl; search=id=3ee38b0e-551b-4190-a617-cf0dd020d99a&itemtype=OwnUse&url=/ledigelokaler/koebenhavn-by/detailhandel-butik&convertedsearch=0; usercookie=id=f3c5912a-aa72-4a78-bff0-826eaab28072&c=MDMvMTQvMjAyMiAxOTo1NDoxNw==&data=JGYzYzU5MTJhLWFhNzItNGE3OC1iZmYwLTgyNmVhYWIyODA3MgCAo5vl0gXaiAEBJGYzYzU5MTJhLWFhNzItNGE3OC1iZmYwLTgyNmVhYWIyODA3Mglhbm9ueW1vdXMA; settings=usersettings=AAEAAAAAAAZsYXRlc3QABmNsb3NlZAA=; prism_610987956=e5c78891-283e-43ad-a8d9-43db0a7dd452; _clck=1xe2hka|1|ezr|0; CookieInformationConsent={"website_uuid":"81ea14e2-c192-405e-8523-46a926c64030","timestamp":"2022-03-14T15:55:15.512Z","consent_url":"https://www.ejendomstorvet.dk/ledigelokaler/koebenhavn-by/detailhandel-butik","consent_website":"Ejendomstorvet.dk","consent_domain":"www.ejendomstorvet.dk","user_uid":"7af36c12-088f-441a-a76a-1a52df8f229a","consents_approved":["cookie_cat_necessary","cookie_cat_functional","cookie_cat_statistic","cookie_cat_marketing","cookie_cat_unclassified"],"consents_denied":[],"user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"}; _gcl_au=1.1.273465374.1647273316; _gid=GA1.2.924375637.1647273316; _ga_0Q5HR8S1F9=GS1.1.1647273305.1.1.1647273357.18; _ga=GA1.2.1641080384.1647273306; _uetsid=073761a0a3af11ecacdf8b5661088ce4; _uetvid=07379f70a3af11ecbff1db1567aff869; _clsk=nd9336|1647273413426|2|1|d.clarity.ms/collect'}
headers= {
'X-Requested-With': 'XMLHttpRequest'}
api_url = "https://www.ejendomstorvet.dk/search/result?gethighlighted=false&imagewidth=620&imageheight=400"
jsonData=requests.get(api_url, headers=headers,cookies=cookies).json()
data=[]
for page in range(1,17,1):
#print(page)
jsonData['NumberOfPages'] = page
for item in jsonData['PropertyResultList']:
title=item['Flashline']
url=item['RefUrl']
address=item['Address']
city=item['City']
data.append([title,url,address,city])
#print(title)
cols=['title','url','address','city']
df = pd.DataFrame(data,columns=cols)
print(df)
#df.to_csv('output.csv',index=False) #to store data
Output:
title ... city
0 Nyopført ejendom ved Ny Ellebjerg st. ... København SV
1 Nyopført ejendom ved Ny Ellebjerg st. ... København SV
2 Nyopført ejendom ved Ny Ellebjerg St. ... København SV
3 Nyopført ejendom ved Ny Ellebjerg St. ... København SV
4 Nyopført ejendom ved Ny Ellebjerg St. ... København SV
.. ... ... ...
187 Strandgade 7, 1401 København K ... København K
188 Centralt beliggende erhvervslokale på Frederik... ... Frederiksberg C
189 Charmerende café ... København Ø
190 Velbeliggende mindre butik/café centralt i Øre... ... København S
191 Produktionskøkken til leje på Frederiksberg ... Frederiksberg
[192 rows x 4 columns]