Home > Software engineering >  Scrape JSON data behind store locator on website
Scrape JSON data behind store locator on website

Time:08-07

I'm working on a school project and am trying to scrape the JSON data for car dealers near a location.

Here's the site I'm trying to scrape: https://www.chevrolet.com/dealer-locator#!?searchTerm=46077&searchType=null

If I inspect the network I see that this request is pulling data from: https://www.chevrolet.com/bypass/pcf/quantum-dealer-locator/v1/getDealers?desiredCount=25&distance=500&makeCodes=001&serviceCodes=019&latitude=39.95023380000001&longitude=-86.27887249999999&searchType=latLongSearch

However, when I attempt to load that query string I get a white label error message:

Whitelabel Error Page
This application has no explicit mapping for /error, so you are seeing this as a fallback.

Sat Aug 06 20:29:17 UTC 2022
There was an unexpected error (type=Bad Request, status=400).
Missing request header 'clientApplicationId' for method parameter of type String

I'm first trying to get access to the underlying JSON data then I'm planning on pulling it down via python. I am very new to this so any help is much appreciated thank you!

CodePudding user response:

You can try the code below:

import requests
import pandas as pd

headers = {
    'clientapplicationid': 'quantum',
    'locale': 'en-US'
}
r = requests.get('https://www.chevrolet.com/bypass/pcf/quantum-dealer-locator/v1/getDealers?desiredCount=25&distance=500&makeCodes=001&serviceCodes=&latitude=39.95023380000001&longitude=-86.27887249999999&searchType=latLongSearch', headers=headers)
df = pd.DataFrame(r.json()['payload']['dealers'])
print(df)

This will print out:

address bac dealerCode  dealerName  dealerUrl   departments distance    distanceProp    generalContact  generalOpeningHour  ... isCertifiedInternet isDealerChild   isDealerParent  makeCodes   partsOpeningHour    scheduleServiceUrl  sellingBrandCode    serviceOpeningHour  services    faxNumber
0   {'addressLine1': '4105 W 96TH ST', 'addressLin...   309991  25166   BILL ESTES CHEVROLET    https://www.billesteschevy.com  [{'code': '1', 'departmentHours': [{'dayOfWeek...   3.1274  {'value': 3.1274, 'measurementUnit': 'MI'}  {'phone1': '3178723315', 'phone2': '3178723315'}    [{'dayOfWeek': [1, 2, 3, 4], 'openFrom': '09:0...   ... True    False   True    [001, 001]  [{'dayOfWeek': [6], 'openFrom': '08:00 AM', 'o...   https://www.billesteschevy.com/ServiceApptForm  13  [{'dayOfWeek': [6], 'openFrom': '08:00 AM', 'o...   [{'code': '001', 'name': 'BUSINESS ASCT VEHICL...   317-872-1677
1   {'addressLine1': '5252 W 38TH ST', 'addressLin...   261314  55662   ANDY MOHR SPEEDWAY CHEVROLET INC.   https://www.andymohrspeedwaychevrolet.com   [{'code': '1', 'departmentHours': [{'dayOfWeek...   8.8525  {'value': 8.8525, 'measurementUnit': 'MI'}  {'phone1': '3174290371', 'phone2': ''}  [{'dayOfWeek': [5, 6], 'openFrom': '09:00 AM',...   ... True    False   True    [001, 001]  [{'dayOfWeek': [6], 'openFrom': '08:00 AM', 'o...   https://www.andymohrspeedwaychevrolet.com/Serv...   13  [{'dayOfWeek': [6], 'openFrom': '08:00 AM', 'o...   [{'code': '001', 'name': 'BUSINESS ASCT VEHICL...   317-216-7223
2   {'addressLine1': '3210 E 96th St', 'addressLin...   112923  25467   PENSKE CHEVROLET    https://www.penskechevy.com [{'code': '1', 'departmentHours': [{'dayOfWeek...   9.0652  {'value': 9.0652, 'measurementUnit': 'MI'}  {'phone1': '8884700790', 'phone2': '8662406644'}    [{'dayOfWeek': [2, 5, 6], 'openFrom': '09:00 A...   ... True    False   True    [001, 001]  [{'dayOfWeek': [6], 'openFrom': '08:00 AM', 'o...   https://www.penskechevy.com/ServiceApptForm 13  [{'dayOfWeek': [6], 'openFrom': '08:00 AM', 'o...   [{'code': '001', 'name': 'BUSINESS ASCT VEHICL...   317-814-4469
3   {'addressLine1': '1920 N Lebanon St', 'address...   309996  81349   BILL ESTES CHEVROLET BUICK GMC  https://www.billesteschevybuickgmc.com  [{'code': '1', 'departmentHours': [{'dayOfWeek...   12.9427 {'value': 12.9427, 'measurementUnit': 'MI'} {'phone1': '3178545995', 'phone2': '3178545995'}    [{'dayOfWeek': [6], 'openFrom': '09:00 AM', 'o...   ... True    False   True    [001, 001]  [{'dayOfWeek': [1, 2, 3, 4, 5], 'openFrom': '0...   https://www.billesteschevybuickgmc.com/Service...   13  [{'dayOfWeek': [1, 2, 3, 4, 5], 'openFrom': '0...   [{'code': '001', 'name': 'BUSINESS ASCT VEHICL...   765-482-0205
4   {'addressLine1': '183 S County Road 525 East',...   286573  12201   CHAMPION CHEVROLET  https://www.chevyofavon.com [{'code': '1', 'departmentHours': [{'dayOfWeek...   15.4439 {'value': 15.4439, 'measurementUnit': 'MI'} {'phone1': '3177456444', 'phone2': '3177456444'}    [{'dayOfWeek': [5, 6], 'openFrom': '09:00 AM',...   ... True    False   True    [001, 001]  [{'dayOfWeek': [6], 'openFrom': '07:30 AM', 'o...   https://www.chevyofavon.com/ServiceApptForm 13  [{'dayOfWeek': [6], 'openFrom': '07:30 AM', 'o...   [{'code': '001', 'name': 'BUSINESS ASCT VEHICL...   317-718-0566

All I did was read that error message carefully, and constructed a header according to the errors received, based on Request Headers from Chrome Dev Tools - Network tab, for that particular XHR request.

  • Related