Home > other >  Web scraping multiple pages in python
Web scraping multiple pages in python

Time:08-03

So I'm trying to web scrape a website that has around 500 pages for used cars and each page has around 22 cars, I managed to extract the first 22 cars from the first page, but how can make my code iterate through all the pages so I can get all cars? (I'm a beginner so sorry if my code is not well structured)

from bs4 import BeautifulSoup 
import requests
import pandas as pd
import numpy as np

website = 'https://ksa.yallamotor.com/used-cars/search'

headers = {
    'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0'
}

response = requests.get(website, headers=headers)

links = []
car_name = []
model_year = []
cars = []

soup = BeautifulSoup(response.text, 'lxml')
cars = soup.find_all('div', class_='singleSearchCard m24t p12 bg-w border-gray border8')

for c in cars:
    l = "https://ksa.yallamotor.com/"   c.find('a', class_='black-link')['href']
    links.append(l)


for i in range(0,22):
    url = links[i]
    session_object = requests.Session()
    result = session_object.get(url, headers=headers)
    soup = BeautifulSoup(result.text, 'lxml')

    name = soup.find('h1', class_="font24")
    car_name.append(name.text)

    y = soup.find_all('div', class_="font14 text-center font-b m2t")[0]
    model_year.append(y.text)

CodePudding user response:

Website is under Cloudflare protection, so you would need something like cloudscraper (pip install cloudscraper). The following code will get you your data (you can further analyse each car, get the details you need, etc):

import cloudscraper
from bs4 import BeautifulSoup

scraper = cloudscraper.create_scraper()

for x in range(1, 501):
    r = scraper.get(f'https://ksa.yallamotor.com/used-cars/search?page={x}&sort=updated_desc')
    soup = BeautifulSoup(r.text, 'html.parser')
    cars = soup.select('.singleSearchCard')
    for car in cars:
        url = car.select_one('a.black-link')       
        print(url.get_text(strip=True), url['href'])

Result printed in terminal:

Used BMW 7 Series  730Li 2018 /used-cars/bmw/7-series/2018/used-bmw-7-series-2018-jeddah-1294758
Used Infiniti QX80  5.6L Luxe (8 Seats) 2020 /used-cars/infiniti/qx80/2020/used-infiniti-qx80-2020-jeddah-1295458
Used Chevrolet Suburban  5.3L LS 2WD 2018 /used-cars/chevrolet/suburban/2018/used-chevrolet-suburban-2018-jeddah-1302084
Used Chevrolet Silverado 2016 /used-cars/chevrolet/silverado/2016/used-chevrolet-silverado-2016-jeddah-1297430
Used GMC Yukon  5.3L SLE (2WD) 2018 /used-cars/gmc/yukon/2018/used-gmc-yukon-2018-jeddah-1304469
Used GMC Yukon  5.3L SLE (2WD) 2018 /used-cars/gmc/yukon/2018/used-gmc-yukon-2018-jeddah-1304481
Used Chevrolet Impala  3.6L LS 2018 /used-cars/chevrolet/impala/2018/used-chevrolet-impala-2018-jeddah-1297427
Used Infiniti Q70  3.7L Luxe 2019 /used-cars/infiniti/q70/2019/used-infiniti-q70-2019-jeddah-1295235
Used Chevrolet Tahoe  LS 2WD 2018 /used-cars/chevrolet/tahoe/2018/used-chevrolet-tahoe-2018-jeddah-1305486
Used Mercedes-Benz 450 SEL 2018 /used-cars/mercedes-benz/450-sel/2018/used-mercedes-benz-450-sel-2018-jeddah-1295830
[...]
  • Related