bs4 findAll not collecting all of the data from the other pages on the website-CodePudding

I'm trying to scrape a real estate website using BeautifulSoup. I'm trying to get a list of rental prices for London. This works but only for the first page on the website. There are over 150 of them so I'm missing out on a lot of data. I would like to be able to collect all the prices from all the pages. Here is the code I'm using:

import requests
from bs4 import BeautifulSoup as soup

url  = 'https://www.zoopla.co.uk/to-rent/property/central-london/?beds_max=5&price_frequency=per_month&q=Central London&results_sort=newest_listings&search_source=home'
response = requests.get(url)
response.status_code

data  = soup(response.content, 'lxml')

prices = []
for line in data.findAll('div', {'class': 'css-1e28vvi-PriceContainer e2uk8e7'}):
    price = str(line).split('>')[2].split(' ')[0].replace('£', '').replace(',','')
    price = int(price)
    prices.append(price)

Any idea as to why I can't collect the prices from all the pages using this script?

Extra question : is there a way to access the price using soup, IE with doing any list/string manipulation? When I call data.find('div', {'class': 'css-1e28vvi-PriceContainer e2uk8e7'}) I get a string of the following form <div class="css-1e28vvi-PriceContainer e2uk8e7" data-testid="listing-price"><p class="css-1o565rw-Text eczcs4p0" size="6">£3,012 pcm</p></div>

Any help would be much appreciated!

CodePudding user response：

You can append &pn=<page number> parameter to the URL to get next pages:

import re
import requests
from bs4 import BeautifulSoup as soup

url = "https://www.zoopla.co.uk/to-rent/property/central-london/?beds_max=5&price_frequency=per_month&q=Central London&results_sort=newest_listings&search_source=home&pn="

prices = []
for page in range(1, 3):  # <-- increase number of pages here
    data = soup(requests.get(url   str(page)).content, "lxml")

    for line in data.findAll(
        "div", {"class": "css-1e28vvi-PriceContainer e2uk8e7"}
    ):
        price = line.get_text(strip=True)
        price = int(re.sub(r"[^\d]", "", price))
        prices.append(price)
        print(price)
    print("-" * 80)

print(len(prices))

Prints:


...

1993
1993
--------------------------------------------------------------------------------
50