I'm trying to extract the pagination number of a webpage and have tried several methods all to no avail;
What's the right method, and please provide an explanation as to why these following methods do not extract the information as requested:
First method:
for i in range(0, 48, 24):
url = f'https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=STATION^1712&maxPrice=500000&radius=0.5&sortType=10&propertyTypes=&mustHave=&dontShow=&index={i}&furnishTypes=&keywords='
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
page = soup.select('span[]')
print(page)
returns:
[]
[]
I've also tried:
1. page = soup.find('span', {'data-bind':'text: total'})
2. page = soup.select("[class~=pagination-pageInfo]")
which returns nothing
page = soup.select('span', {'data-bind':'text: total'})
which returns a bunch of unnecessary things and not the pagination number.
How do I get the pagination number at the bottom? expected output:
1
2
CodePudding user response:
There is no pagination element in DOM tree you get because this data loads by Javascript. You have 2 options:
- You can use Selenium and do what you do (search element by
span[]
selector). - You still can use
requests
for your purpose, because you can find all page data including pagination in the JSON at the bottom of page HTML. You can easily get it with regular expressions. Full code:
import json
import requests
import re
for i in range(0, 48, 24):
url = f'https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=STATION^1712&maxPrice=500000&radius=0.5&sortType=10&propertyTypes=&mustHave=&dontShow=&index={i}&furnishTypes=&keywords='
r = requests.get(url)
html = r.text
full_data_json = json.loads(re.search(r'window\.jsonModel = (.*)</script>', html).group(1))
print(full_data_json["pagination"]["page"])