I'm trying to scrape a real estate website using BeautifulSoup. I'm trying to get a list of rental prices for London. This works but only for the first page on the website. There are over 150 of them so I'm missing out on a lot of data. I would like to be able to collect all the prices from all the pages. Here is the code I'm using:
import requests
from bs4 import BeautifulSoup as soup
url = 'https://www.zoopla.co.uk/to-rent/property/central-london/?beds_max=5&price_frequency=per_month&q=Central London&results_sort=newest_listings&search_source=home'
response = requests.get(url)
response.status_code
data = soup(response.content, 'lxml')
prices = []
for line in data.findAll('div', {'class': 'css-1e28vvi-PriceContainer e2uk8e7'}):
price = str(line).split('>')[2].split(' ')[0].replace('£', '').replace(',','')
price = int(price)
prices.append(price)
Any idea as to why I can't collect the prices from all the pages using this script?
Extra question : is there a way to access the price using soup, IE with doing any list/string manipulation? When I call data.find('div', {'class': 'css-1e28vvi-PriceContainer e2uk8e7'})
I get a string of the following form <div class="css-1e28vvi-PriceContainer e2uk8e7" data-testid="listing-price"><p class="css-1o565rw-Text eczcs4p0" size="6">£3,012 pcm</p></div>
Any help would be much appreciated!
CodePudding user response:
You can append &pn=<page number>
parameter to the URL to get next pages:
import re
import requests
from bs4 import BeautifulSoup as soup
url = "https://www.zoopla.co.uk/to-rent/property/central-london/?beds_max=5&price_frequency=per_month&q=Central London&results_sort=newest_listings&search_source=home&pn="
prices = []
for page in range(1, 3): # <-- increase number of pages here
data = soup(requests.get(url str(page)).content, "lxml")
for line in data.findAll(
"div", {"class": "css-1e28vvi-PriceContainer e2uk8e7"}
):
price = line.get_text(strip=True)
price = int(re.sub(r"[^\d]", "", price))
prices.append(price)
print(price)
print("-" * 80)
print(len(prices))
Prints:
...
1993
1993
--------------------------------------------------------------------------------
50