Home > front end >  How to scrape all results from list instead of be limited to only 20?
How to scrape all results from list instead of be limited to only 20?

Time:11-16

I'm using beautifulsoup and trying to scrape some cars24.com data. The list, however, only contains 20 cars details. That's weird, since the page contains a lot more car details (I tried saving it). What am I doing wrong and how can I get it to scrape the whole page?

This is my code:

from bs4 import BeautifulSoup as bs
import requests
link = 'https://www.cars24.com/buy-used-car?sort=P&storeCityId=2&pinId=110001'
page=requests.get(link)
soup = bs(page.content,'html.parser')
car_name = soup.find_all('h2',class_='_3FpCg') 
cust_name = []
for i in range(0, len(car_name)):
    cust_name.append(car_name[i].get_text())
cust_name

Is there a workaround for this? Appreciate the help.

CodePudding user response:

Use the API endpoint.

For example:

import requests

url = "https://api-sell24.cars24.team/buy-used-car?sort=P&serveWarrantyCount=true&gaId=&page=1&storeCityId=2&pinId=110001"
cars = requests.get(url).json()['data']['content']
base = "https://www.cars24.com/buy-used-"

for car in cars:
    car_name = "-".join(car['carName'].lower().split())
    car_city = "-".join(car['city'].lower().split())
    offer = f"{base}{car_name}-{car['year']}-cars-{car_city}-{car['carId']}"
    print(f"{car['carName']} - {car['year']} - {car['price']}")
    print(offer)

Output:

Maruti Swift Dzire - 2010 - 256299
https://www.cars24.com/buy-used-maruti-swift-dzire-2010-cars-new-delhi-10084891724
Hyundai Grand i10 - 2018 - 526599
https://www.cars24.com/buy-used-hyundai-grand-i10-2018-cars-noida-10572294761
Datsun Redi Go - 2018 - 234499
https://www.cars24.com/buy-used-datsun-redi-go-2018-cars-gurgaon-11073694705
Maruti Swift - 2020 - 566499
https://www.cars24.com/buy-used-maruti-swift-2020-cars-faridabad-11041770770
Hyundai i10 - 2009 - 170699
https://www.cars24.com/buy-used-hyundai-i10-2009-cars-rohtak-1007315463
Maruti Swift - 2020 - 577399
https://www.cars24.com/buy-used-maruti-swift-2020-cars-new-delhi-10065678773
Hyundai Grand i10 - 2018 - 508799
https://www.cars24.com/buy-used-hyundai-grand-i10-2018-cars-ghaziabad-11261195767
Maruti Swift - 2020 - 587599
https://www.cars24.com/buy-used-maruti-swift-2020-cars-new-delhi-10016194709
Maruti Swift - 2020 - 524099
https://www.cars24.com/buy-used-maruti-swift-2020-cars-new-delhi-10010390743
Hyundai AURA - 2021 - 675099
https://www.cars24.com/buy-used-hyundai-aura-2021-cars-faridabad-11095494760
Maruti Swift - 2019 - 541899
https://www.cars24.com/buy-used-maruti-swift-2019-cars-new-delhi-10016570794
Hyundai Grand i10 - 2019 - 490449
https://www.cars24.com/buy-used-hyundai-grand-i10-2019-cars-noida-10532691707
Hyundai Santro Xing - 2013 - 281999
https://www.cars24.com/buy-used-hyundai-santro-xing-2013-cars-gurgaon-10168291760
Hyundai Santro Xing - 2014 - 272099
https://www.cars24.com/buy-used-hyundai-santro-xing-2014-cars-gurgaon-10121974770
Mercedes Benz C Class - 2014 - 1854499
https://www.cars24.com/buy-used-mercedes-benz-c-class-2014-cars-new-delhi-1050064264
KIA CARENS - 2022 - 1608099
https://www.cars24.com/buy-used-kia-carens-2022-cars-gurgaon-10160777793
Tata ALTROZ - 2021 - 711599
https://www.cars24.com/buy-used-tata-altroz-2021-cars-new-delhi-10083196703
Maruti New  Wagon-R - 2020 - 508899
https://www.cars24.com/buy-used-maruti-new-wagon-r-2020-cars-new-delhi-10084875775
Hyundai Grand i10 - 2018 - 509099
https://www.cars24.com/buy-used-hyundai-grand-i10-2018-cars-new-delhi-10011277773
Maruti Wagon R 1.0 - 2011 - 282499
https://www.cars24.com/buy-used-maruti-wagon-r-1.0-2011-cars-new-delhi-10080499706

Note: You can paginate the API by incrementing the value of page in the URL.

For example:

import requests
import pandas as pd

base = "https://www.cars24.com/buy-used-"

table = []
with requests.Session() as s:
    for page in range(1, 11):
        url = f"https://api-sell24.cars24.team/buy-used-car?sort=P&serveWarrantyCount=true&gaId=&page={page}&storeCityId=2&pinId=110001"
        cars = s.get(url).json()['data']['content']
        print(f"Getting page {page}...")
        for car in cars:
            car_name = "-".join(car['carName'].lower().split())
            car_city = "-".join(car['city'].lower().split())
            offer_url = f"{base}{car_name}-{car['year']}-cars-{car_city}-{car['carId']}"
            table.append([car['carName'], car['year'], car['price'], offer_url])

df = pd.DataFrame(table, columns=['Car Name', 'Year', 'Price', 'Offer URL'])
df.to_csv('cars.csv', index=False)

Output: a .csv file:

enter image description here

CodePudding user response:

for more details @baduker I am using below code..

import pandas as pd

base = "https://www.cars24.com/buy-used-"

table = []
with requests.Session() as s:
    for page in range(1, 2):
        url = f"https://api-sell24.cars24.team/buy-used-car"
        cars = s.get(url).json()['data']['content']
        #print(f"Getting page {page}...")
        for car in cars:
            name = "-".join(car['carName'].lower().split())
            year = car['year']
            city = "-".join(car['city'].lower().split())
            offer_url = f"{base}{name}-{car['year']}-cars-{city}-{car['carId']}"
            table.append([car['carName'], car['city'], car['year'], car['fuelType'], car['kilometerDriven'], car['isC24Assured'], car['registrationState'], car['ownerNumber'], car['bodyType'], car['discountPrice'], car['price'], offer_url])
        

df = pd.DataFrame(table, columns=['name', 'city', 'year', 'fuelType', 'kilometerDriven', 'isC24Assured', 'registrationState', 'ownerNumber', 'bodyType', 'discountPrice', 'price', 'URL'])
df.head(2)
#df.to_csv('cars.csv', index=False)

using these i have extracted few variables but I need further variables for my study. For this i need to look into each car urls and get feature and specification of the car.

Lets say, first car URL is hyundaicar

I am assuming the logic should be for loop for all URLs and from their i can extract data but don't know about the APIs for all URLs.

Pls Guide.. Thanks in advance..

  • Related