So im trying to make a data science project using information from this site. But sadly when I try to scrape it, it blocks me because it thinks I am a bot. I saw a couple of post here: Python webscraping blocked but it seems that Immoscout have already found a solution to this workaround. Does somebody know how I can come around this? thanks!
My Code:
import requests
from bs4 import BeautifulSoup
import random
headers = {"User-Agent": "Mozilla/5.0 (Linux; U; Android 4.2.2; he-il; NEO-X5-116A Build/JDQ39) AppleWebKit/534.30 ("
"KHTML, like Gecko) Version/4.0 Safari/534.30 , 'Accept-Language': 'en-US,en;q=0.5'"}
url = "https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-kaufen?enteredFrom=one_step_search"
response = requests.get(url, cookies={'required_cookie': 'reese84=xxx'} ,headers=headers)
webpage = response.content
print(response.status_code)
soup = BeautifulSoup(webpage, "html.parser")
print(soup.prettify)
thanks :)
CodePudding user response:
Data is generating dynamically from API calls json response as POST method and You can extract data using only requests
module.So,You can follow the next example.
import requests
headers= {
'content-type': 'application/json',
'x-requested-with': 'XMLHttpRequest'
}
api_url = "https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-kaufen?pagenumber=1"
jsonData = requests.post(api_url).json()
for item in jsonData['searchResponseModel']['resultlist.resultlist']['resultlistEntries'][0]['resultlistEntry']:
value=item['attributes'][0]['attribute'][0]['value'].replace('€','').replace('.',',')
print(value)
Output:
4,350,000
285,000
620,000
590,000
535,000
972,500
579,000
1,399,900
325,000
749,000
290,000
189,900
361,825
199,900
299,000
195,000
1,225,000
199,000
825,000
315,000