I'm using beautifulsoup
and trying to scrape some cars24.com data. The list
, however, only contains 20 cars details. That's weird, since the page contains a lot more car details (I tried saving it). What am I doing wrong and how can I get it to scrape the whole page?
This is my code:
from bs4 import BeautifulSoup as bs
import requests
link = 'https://www.cars24.com/buy-used-car?sort=P&storeCityId=2&pinId=110001'
page=requests.get(link)
soup = bs(page.content,'html.parser')
car_name = soup.find_all('h2',class_='_3FpCg')
cust_name = []
for i in range(0, len(car_name)):
cust_name.append(car_name[i].get_text())
cust_name
Is there a workaround for this? Appreciate the help.
CodePudding user response:
Use the API endpoint.
For example:
import requests
url = "https://api-sell24.cars24.team/buy-used-car?sort=P&serveWarrantyCount=true&gaId=&page=1&storeCityId=2&pinId=110001"
cars = requests.get(url).json()['data']['content']
base = "https://www.cars24.com/buy-used-"
for car in cars:
car_name = "-".join(car['carName'].lower().split())
car_city = "-".join(car['city'].lower().split())
offer = f"{base}{car_name}-{car['year']}-cars-{car_city}-{car['carId']}"
print(f"{car['carName']} - {car['year']} - {car['price']}")
print(offer)
Output:
Maruti Swift Dzire - 2010 - 256299
https://www.cars24.com/buy-used-maruti-swift-dzire-2010-cars-new-delhi-10084891724
Hyundai Grand i10 - 2018 - 526599
https://www.cars24.com/buy-used-hyundai-grand-i10-2018-cars-noida-10572294761
Datsun Redi Go - 2018 - 234499
https://www.cars24.com/buy-used-datsun-redi-go-2018-cars-gurgaon-11073694705
Maruti Swift - 2020 - 566499
https://www.cars24.com/buy-used-maruti-swift-2020-cars-faridabad-11041770770
Hyundai i10 - 2009 - 170699
https://www.cars24.com/buy-used-hyundai-i10-2009-cars-rohtak-1007315463
Maruti Swift - 2020 - 577399
https://www.cars24.com/buy-used-maruti-swift-2020-cars-new-delhi-10065678773
Hyundai Grand i10 - 2018 - 508799
https://www.cars24.com/buy-used-hyundai-grand-i10-2018-cars-ghaziabad-11261195767
Maruti Swift - 2020 - 587599
https://www.cars24.com/buy-used-maruti-swift-2020-cars-new-delhi-10016194709
Maruti Swift - 2020 - 524099
https://www.cars24.com/buy-used-maruti-swift-2020-cars-new-delhi-10010390743
Hyundai AURA - 2021 - 675099
https://www.cars24.com/buy-used-hyundai-aura-2021-cars-faridabad-11095494760
Maruti Swift - 2019 - 541899
https://www.cars24.com/buy-used-maruti-swift-2019-cars-new-delhi-10016570794
Hyundai Grand i10 - 2019 - 490449
https://www.cars24.com/buy-used-hyundai-grand-i10-2019-cars-noida-10532691707
Hyundai Santro Xing - 2013 - 281999
https://www.cars24.com/buy-used-hyundai-santro-xing-2013-cars-gurgaon-10168291760
Hyundai Santro Xing - 2014 - 272099
https://www.cars24.com/buy-used-hyundai-santro-xing-2014-cars-gurgaon-10121974770
Mercedes Benz C Class - 2014 - 1854499
https://www.cars24.com/buy-used-mercedes-benz-c-class-2014-cars-new-delhi-1050064264
KIA CARENS - 2022 - 1608099
https://www.cars24.com/buy-used-kia-carens-2022-cars-gurgaon-10160777793
Tata ALTROZ - 2021 - 711599
https://www.cars24.com/buy-used-tata-altroz-2021-cars-new-delhi-10083196703
Maruti New Wagon-R - 2020 - 508899
https://www.cars24.com/buy-used-maruti-new-wagon-r-2020-cars-new-delhi-10084875775
Hyundai Grand i10 - 2018 - 509099
https://www.cars24.com/buy-used-hyundai-grand-i10-2018-cars-new-delhi-10011277773
Maruti Wagon R 1.0 - 2011 - 282499
https://www.cars24.com/buy-used-maruti-wagon-r-1.0-2011-cars-new-delhi-10080499706
Note: You can paginate the API by incrementing the value of page
in the URL.
For example:
import requests
import pandas as pd
base = "https://www.cars24.com/buy-used-"
table = []
with requests.Session() as s:
for page in range(1, 11):
url = f"https://api-sell24.cars24.team/buy-used-car?sort=P&serveWarrantyCount=true&gaId=&page={page}&storeCityId=2&pinId=110001"
cars = s.get(url).json()['data']['content']
print(f"Getting page {page}...")
for car in cars:
car_name = "-".join(car['carName'].lower().split())
car_city = "-".join(car['city'].lower().split())
offer_url = f"{base}{car_name}-{car['year']}-cars-{car_city}-{car['carId']}"
table.append([car['carName'], car['year'], car['price'], offer_url])
df = pd.DataFrame(table, columns=['Car Name', 'Year', 'Price', 'Offer URL'])
df.to_csv('cars.csv', index=False)
Output: a .csv
file:
CodePudding user response:
for more details @baduker I am using below code..
import pandas as pd
base = "https://www.cars24.com/buy-used-"
table = []
with requests.Session() as s:
for page in range(1, 2):
url = f"https://api-sell24.cars24.team/buy-used-car"
cars = s.get(url).json()['data']['content']
#print(f"Getting page {page}...")
for car in cars:
name = "-".join(car['carName'].lower().split())
year = car['year']
city = "-".join(car['city'].lower().split())
offer_url = f"{base}{name}-{car['year']}-cars-{city}-{car['carId']}"
table.append([car['carName'], car['city'], car['year'], car['fuelType'], car['kilometerDriven'], car['isC24Assured'], car['registrationState'], car['ownerNumber'], car['bodyType'], car['discountPrice'], car['price'], offer_url])
df = pd.DataFrame(table, columns=['name', 'city', 'year', 'fuelType', 'kilometerDriven', 'isC24Assured', 'registrationState', 'ownerNumber', 'bodyType', 'discountPrice', 'price', 'URL'])
df.head(2)
#df.to_csv('cars.csv', index=False)
using these i have extracted few variables but I need further variables for my study. For this i need to look into each car urls and get feature and specification of the car.
I am assuming the logic should be for loop for all URLs and from their i can extract data but don't know about the APIs for all URLs.
Pls Guide.. Thanks in advance..