I am trying to find the api from where this site is getting its data but I can't find it.
In the network tab I can see the data in the xhr response. Data is changed each time the other page is selected but I have to extract the data but don't know how to do it. I don't know from where the website is getting its data. I am purely new to this. Can you guide me how to get the data or scrape this website. I have tried to find examples related to this but not getting correct one. Thanks in advance.
CodePudding user response:
The code below will get you the data you are looking for.
Use the field recordCount
in order to set the range
you need to loop on.
How it works
The website is using and API call in order to get the data as JSON. It uses paging technique - it passes the page index and the page size to the server so the server knows what is the page offset and it knows which data to return. The code below simulate this activity - the loop increments the page index and this way we iterate over the data.
import requests
import time
headers = {
"accept": "application/json, text/javascript, */*; q=0.01",
"accept-language": "en-US,en;q=0.9,el;q=0.8,he;q=0.7,de;q=0.6,fr;q=0.5,it;q=0.4,es;q=0.3",
"cache-control": "no-cache",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"pragma": "no-cache",
"sec-ch-ua": "\"Google Chrome\";v=\"93\", \" Not;A Brand\";v=\"99\", \"Chromium\";v=\"93\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"macOS\"",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"x-requested-with": "XMLHttpRequest"
}
body = {'professionalType': 'General Contractor',
'Name': '',
'sortName': 'OverallScore',
'sortDirection': 'desc',
'pageIndex': 0,
'pageSize': 10}
url = 'https://govservices.dcra.dc.gov/contractorratingsystem/BuildingProfessionals/LoadProfessionalSearchResultsWithFilters'
for i in range(1, 3): # TODO use actual range based on 'recordCount' (in the response) and 'pageSize'
body['pageIndex'] = i
r = requests.post(url, headers=headers, data=body)
if r.status_code == 200:
print(f'{i} --> {r.json()}')
else:
print(f'status code is {r.status_code}')
time.sleep(1)
output
1 --> {'buildingProfessionals': [{'buildingProfessional': 'REVOLUTION SOLAR LLC.', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410518000062', 'projectCount': 822, 'planReviewScore': 96.1732900783996, 'applicationIntakeScore': 96.2433090024331, 'inspectionScore': 100, 'overAllProjectScore': 100, 'stopWorkOrders': 0, 'planReviewScoreRating': 4.80866450391998, 'applicationIntakeScoreRating': 4.812165450121655, 'inspectionScoreRating': 5, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '10746 JUDY LANE COLUMBIA MD 21044', 'businessPhone': '4438655039', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'AMERICAN AUTOMATIC SPRINKLER CO', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410514000016', 'projectCount': 471, 'planReviewScore': 0.29603315571344, 'applicationIntakeScore': 0.245115452930728, 'inspectionScore': 99.7122042886194, 'overAllProjectScore': 100, 'stopWorkOrders': 12, 'planReviewScoreRating': 0.014801657785672, 'applicationIntakeScoreRating': 0.0122557726465364, 'inspectionScoreRating': 4.98561021443097, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '3149 DRAPER DRIVE FAIRFAX VA 22031', 'businessPhone': '7038498180', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'FIRE & LIFE SAFETY AMERICA INC.', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410516000410', 'projectCount': 348, 'planReviewScore': 0.218818380743982, 'applicationIntakeScore': 0.218818380743982, 'inspectionScore': 99.781181619256, 'overAllProjectScore': 100, 'stopWorkOrders': 2, 'planReviewScoreRating': 0.0109409190371991, 'applicationIntakeScoreRating': 0.0109409190371991, 'inspectionScoreRating': 4.9890590809628, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '3017 VERNON ROAD RICHMOND VA 23228', 'businessPhone': '8042221381', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'NORTHERN FIRE PROTECTION, INC.', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410516000183', 'projectCount': 250, 'planReviewScore': 0, 'applicationIntakeScore': 0, 'inspectionScore': 98.7889273356401, 'overAllProjectScore': 100, 'stopWorkOrders': 2, 'planReviewScoreRating': 0, 'applicationIntakeScoreRating': 0, 'inspectionScoreRating': 4.939446366782005, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '21530 BLACKWOOD COURT SUITE #150 STERLING VA 20166', 'businessPhone': '7034069811', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'PHOENIX FIRE PROTECTION INC.', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410518000155', 'projectCount': 174, 'planReviewScore': 0, 'applicationIntakeScore': 0, 'inspectionScore': 100, 'overAllProjectScore': 100, 'stopWorkOrders': 4, 'planReviewScoreRating': 0, 'applicationIntakeScoreRating': 0, 'inspectionScoreRating': 5, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '7901 PENN RANDALL PLACE UPPER MARLBORO MD 20772', 'businessPhone': '3016697066', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'JENSON FIRE PROTECTION INC', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410517000309', 'projectCount': 146, 'planReviewScore': 0.632911392405063, 'applicationIntakeScore': 0.632911392405063, 'inspectionScore': 100, 'overAllProjectScore': 100, 'stopWorkOrders': 0, 'planReviewScoreRating': 0.03164556962025315, 'applicationIntakeScoreRating': 0.03164556962025315, 'inspectionScoreRating': 5, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '8740 CHERRY LANE UNIT 13 LAUREL MD 20707', 'businessPhone': '', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'LIVINGSTON FIRE PROTECTION INC', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410516000203', 'projectCount': 145, 'planReviewScore': 0, 'applicationIntakeScore': 0, 'inspectionScore': 98.3734939759036, 'overAllProjectScore': 100, 'stopWorkOrders': 5, 'planReviewScoreRating': 0, 'applicationIntakeScoreRating': 0, 'inspectionScoreRating': 4.91867469879518, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '5150 LAWRENCE PLACE HYATTSVILLE MD 20781', 'businessPhone': '3017794466', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'RIDGEWAY CORPORATION PROFESSIONAL CORPORATION', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410518000087', 'projectCount': 145, 'planReviewScore': 0.798403193612774, 'applicationIntakeScore': 1.4251497005988, 'inspectionScore': 95.688622754491, 'overAllProjectScore': 100, 'stopWorkOrders': 4, 'planReviewScoreRating': 0.0399201596806387, 'applicationIntakeScoreRating': 0.07125748502994, 'inspectionScoreRating': 4.78443113772455, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '12514 KENSINGTON LANE BOWIE MD 20715', 'businessPhone': '3014642003', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'FORTRESS PROTECTION GROUP', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410518000115', 'projectCount': 124, 'planReviewScore': 0, 'applicationIntakeScore': 0, 'inspectionScore': 99.4932432432432, 'overAllProjectScore': 100, 'stopWorkOrders': 5, 'planReviewScoreRating': 0, 'applicationIntakeScoreRating': 0, 'inspectionScoreRating': 4.97466216216216, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '18618 BROKEN OAK RD BOYDS MD 20841', 'businessPhone': '', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'PRIME FIRE PROTECTION LLC', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410517000488', 'projectCount': 120, 'planReviewScore': 0, 'applicationIntakeScore': 0, 'inspectionScore': 94.4320987654321, 'overAllProjectScore': 100, 'stopWorkOrders': 20, 'planReviewScoreRating': 0, 'applicationIntakeScoreRating': 0, 'inspectionScoreRating': 4.721604938271605, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '13549 JAMIESON PL GERMANTOWN MD 20874', 'businessPhone': '3104736189', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}], 'pageIndex': 1, 'pageSize': 10, 'recordCount': 1113}
...
CodePudding user response:
You could use requests to hit the url which generates the response and then use Beautiful soup to parse it maybe?