Home > Back-end >  Can't scrape listing links from a webpage using the requests module
Can't scrape listing links from a webpage using the requests module

Time:02-03

I'm trying to scrape different listings for this search Oxford, Oxfordshire from this webpage using requests module. This is how the inputbox looks before I click the search button.

I've defined an accurate selector to locate the listings, but the script fails to grab any data.

import requests
from pprint import pprint
from bs4 import BeautifulSoup

link = 'https://www.zoopla.co.uk/search/'

headers = {
    'Accept': 'text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9,bn;q=0.8',
    'Referer': 'https://www.zoopla.co.uk/for-sale/',
    'X-Requested-With': 'XMLHttpRequest',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
}
params = {
    'view_type': 'list',
    'section': 'for-sale',
    'q': 'Oxford, Oxfordshire',
    'geo_autocomplete_identifier': 'oxford',
    'search_source': 'home'
}
res = requests.get(link,params=params,headers=headers)
soup = BeautifulSoup(res.text,"html5lib")
for item in soup.select("[id^='listing'] a[href^='/for-sale/details/']:has(h2[data-testid='listing-title'])"):
    print(item.get("href"))

CodePudding user response:

this page use javascript dom render you need to use selenium

  1. pip install selenium
  2. pip install webdriver-manager
  3. download firfox browser from here or click here
  1. then use this code bellow

import requests
from pprint import pprint
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.service import Service
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver


link = 'https://www.zoopla.co.uk/search/'

headers = {
    'Accept': 'text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9,bn;q=0.8',
    'Referer': 'https://www.zoopla.co.uk/for-sale/',
    'X-Requested-With': 'XMLHttpRequest',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
}
params = {
    'view_type': 'list',
    'section': 'for-sale',
    'q': 'Oxford, Oxfordshire',
    'geo_autocomplete_identifier': 'oxford',
    'search_source': 'home'
}


options = webdriver.FirefoxOptions()
options.headless = True
driver = webdriver.Firefox(service=Service(GeckoDriverManager().install()), options=options)
res = requests.get(link,params=params,headers=headers)


driver.get(res.url)

WebDriverWait(driver, 600).until(EC.presence_of_element_located((By.XPATH, '/html')))
print('yes')
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

print(res.url)
soup = BeautifulSoup(html,"html.parser")
open('aa.html', 'wb').write(html.encode())
for item in soup.select("[id^='listing'] a[href^='/for-sale/details/']:has(h2[data-testid='listing-title'])"):
    print(item.get("href"))

CodePudding user response:

I have created a highly efficient Zoopla scrapper using Python Scrapy. This tool can easily extract all Zoopla listings and their relevant details. If you are located outside the UK (I have signed a non-disclosure agreement with a UK company), I would be happy to offer my expertise and assist you with your scraping needs.

  • Related