I try to extract all the data for every school on the following site:
https://schulfinder.kultus-bw.de/
My code is this:
import requests
from selenium import webdriver
from bs4 import BeautifulSoup
from requests import get
from selenium.webdriver.common.by import By
import json
url = "https://schulfinder.kultus-bw.de/api/school?uuid=81af189c-7bc0-44a3-8c9f-73e6d6e50fdb&_=1675072758525"
payload = {}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
Output is this:
{
"outpost_number": "0",
"name": "Gartenschule Grundschule Ebnat",
"street": "Abt-Angehrn-Str.",
"house_number": "5",
"postcode": "73432",
"city": "Aalen",
"phone": " 49736796700",
"fax": " 497367967016",
"email": "[email protected]",
"website": null,
"tablet_tranche": null,
"tablet_platform": null,
"tablet_branches": null,
"tablet_trades": null,
"lat": 48.80094,
"lng": 10.18761,
"official": 0,
"branches": [
{
"branch_id": 12110,
"acronym": "GS",
"description_long": "Grundschule"
}
],
"trades": []
}
I got the code via Chrome Inspector Network and requested the URL per Postman. My problem is, that I just get the Info for one school, and I can't find out how to request all the schools.
CodePudding user response:
In addition to the answer already given.
To get all the search criteria for the GET request to the API, you can parse the main page contents using BeautifulSoup you've already imported:
from bs4 import BeautifulSoup
import requests
search_page_url = "https://schulfinder.kultus-bw.de"
page_contents = requests.request("GET", search_page_url).text
parsed_html = BeautifulSoup(page_contents, features="html.parser")
input_elements = parsed_html.body.find_all('input')
search_params = list(map(lambda x: (x.get('name'), x.get('type'), x.get('value')), input_elements))
search_params
contains tuples of a name, type, and value. It should give you insights into parameters and their possible values.
CodePudding user response:
Simply use the correct endpoint:
https://schulfinder.kultus-bw.de/api/schools?distance=1&outposts=1&owner=&school_kind=&term=&types=&work_schedule=&_=1675079497084
That will give you a list
of schools, that could be used to request further data via your endpoint from question (https://schulfinder.kultus-bw.de/api/school?...) using the uuid
.
[{"uuid":"50de01a4-503d-44d1-af4b-a6031a022b85","outpost_number":"0","name":"Grundschule Aach","city":"Aach","lat":47.84399,"lng":8.85067,"official":0,"marker_class":"marker green","marker_label":"G","website":null},{"uuid":"8818037f-9aed-4860-b42e-8a49b1403c02","outpost_number":"0","name":"Braunenbergschule Grundschule Wasseralfingen","city":"Aalen","lat":48.8612,"lng":10.11191,"official":0,"marker_class":"marker green","marker_label":"G","website":null},...]
Be aware, that the result is limited to 500 and you have to use and filters and combine results to get all of them.:
Das Suchlimit wurde erreicht. Mehr als 500 Treffer werden nicht angezeigt. Bitte verfeinern Sie Ihre Suche indem Sie z. B. einen Ort angeben.
Example
import requests
url = 'https://schulfinder.kultus-bw.de/api/schools?distance=1&outposts=1&owner=&school_kind=&term=&types=&work_schedule=&_=1675079497084'
data = []
for uuid in [item['uuid'] for item in requests.get(url).json()]:
url = url = f'https://schulfinder.kultus-bw.de/api/school?uuid={uuid}&_=1675072758525'
data.append(
requests.get(url).json()
)
data
Output
[{'outpost_number': '0', 'name': 'Grundschule Aach', 'street': 'Schulstr.', 'house_number': '5', 'postcode': '78267', 'city': 'Aach', 'phone': ' 4977741442', 'fax': None, 'email': '[email protected]', 'website': None, 'tablet_tranche': None, 'tablet_platform': None, 'tablet_branches': None, 'tablet_trades': None, 'lat': 47.84399, 'lng': 8.85067, 'official': 0, 'branches': [{'branch_id': 12110, 'acronym': 'GS', 'description_long': 'Grundschule'}], 'trades': []}, {'outpost_number': '0', 'name': 'Braunenbergschule Grundschule Wasseralfingen', 'street': 'Steinstr.', 'house_number': '38', 'postcode': '73433', 'city': 'Aalen', 'phone': ' 49736197700', 'fax': ' 497361977019', 'email': '[email protected]', 'website': 'http://www.braunenbergschule.de', 'tablet_tranche': None, 'tablet_platform': None, 'tablet_branches': None, 'tablet_trades': None, 'lat': 48.8612, 'lng': 10.11191, 'official': 0, 'branches': [{'branch_id': 12110, 'acronym': 'GS', 'description_long': 'Grundschule'}], 'trades': []},...]