Home > Net >  How to scrape and extract data from JSON file?
How to scrape and extract data from JSON file?

Time:02-01

I try to extract all the data for every school on the following site:

https://schulfinder.kultus-bw.de/

My code is this:

import requests
from selenium import webdriver
from bs4 import BeautifulSoup
from requests import get
from selenium.webdriver.common.by import By
import json

url = "https://schulfinder.kultus-bw.de/api/school?uuid=81af189c-7bc0-44a3-8c9f-73e6d6e50fdb&_=1675072758525"

payload = {}
headers = {}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

Output is this:

{
  "outpost_number": "0",
  "name": "Gartenschule Grundschule Ebnat",
  "street": "Abt-Angehrn-Str.",
  "house_number": "5",
  "postcode": "73432",
  "city": "Aalen",
  "phone": " 49736796700",
  "fax": " 497367967016",
  "email": "[email protected]",
  "website": null,
  "tablet_tranche": null,
  "tablet_platform": null,
  "tablet_branches": null,
  "tablet_trades": null,
  "lat": 48.80094,
  "lng": 10.18761,
  "official": 0,
  "branches": [
    {
      "branch_id": 12110,
      "acronym": "GS",
      "description_long": "Grundschule"
    }
  ],
  "trades": []
}

I got the code via Chrome Inspector Network and requested the URL per Postman. My problem is, that I just get the Info for one school, and I can't find out how to request all the schools.

CodePudding user response:

In addition to the answer already given.

To get all the search criteria for the GET request to the API, you can parse the main page contents using BeautifulSoup you've already imported:

from bs4 import BeautifulSoup
import requests

search_page_url = "https://schulfinder.kultus-bw.de"
page_contents = requests.request("GET", search_page_url).text

parsed_html = BeautifulSoup(page_contents, features="html.parser")
input_elements = parsed_html.body.find_all('input')
search_params = list(map(lambda x: (x.get('name'), x.get('type'), x.get('value')), input_elements))

search_params contains tuples of a name, type, and value. It should give you insights into parameters and their possible values.

CodePudding user response:

Simply use the correct endpoint:

https://schulfinder.kultus-bw.de/api/schools?distance=1&outposts=1&owner=&school_kind=&term=&types=&work_schedule=&_=1675079497084

That will give you a list of schools, that could be used to request further data via your endpoint from question (https://schulfinder.kultus-bw.de/api/school?...) using the uuid.

[{"uuid":"50de01a4-503d-44d1-af4b-a6031a022b85","outpost_number":"0","name":"Grundschule Aach","city":"Aach","lat":47.84399,"lng":8.85067,"official":0,"marker_class":"marker green","marker_label":"G","website":null},{"uuid":"8818037f-9aed-4860-b42e-8a49b1403c02","outpost_number":"0","name":"Braunenbergschule Grundschule Wasseralfingen","city":"Aalen","lat":48.8612,"lng":10.11191,"official":0,"marker_class":"marker green","marker_label":"G","website":null},...]

Be aware, that the result is limited to 500 and you have to use and filters and combine results to get all of them.:

Das Suchlimit wurde erreicht. Mehr als 500 Treffer werden nicht angezeigt. Bitte verfeinern Sie Ihre Suche indem Sie z. B. einen Ort angeben.

Example

import requests

url = 'https://schulfinder.kultus-bw.de/api/schools?distance=1&outposts=1&owner=&school_kind=&term=&types=&work_schedule=&_=1675079497084'

data = []

for uuid in [item['uuid'] for item in requests.get(url).json()]:
    url = url = f'https://schulfinder.kultus-bw.de/api/school?uuid={uuid}&_=1675072758525'
    data.append(
        requests.get(url).json()
    )

data

Output

[{'outpost_number': '0', 'name': 'Grundschule Aach', 'street': 'Schulstr.', 'house_number': '5', 'postcode': '78267', 'city': 'Aach', 'phone': ' 4977741442', 'fax': None, 'email': '[email protected]', 'website': None, 'tablet_tranche': None, 'tablet_platform': None, 'tablet_branches': None, 'tablet_trades': None, 'lat': 47.84399, 'lng': 8.85067, 'official': 0, 'branches': [{'branch_id': 12110, 'acronym': 'GS', 'description_long': 'Grundschule'}], 'trades': []}, {'outpost_number': '0', 'name': 'Braunenbergschule Grundschule Wasseralfingen', 'street': 'Steinstr.', 'house_number': '38', 'postcode': '73433', 'city': 'Aalen', 'phone': ' 49736197700', 'fax': ' 497361977019', 'email': '[email protected]', 'website': 'http://www.braunenbergschule.de', 'tablet_tranche': None, 'tablet_platform': None, 'tablet_branches': None, 'tablet_trades': None, 'lat': 48.8612, 'lng': 10.11191, 'official': 0, 'branches': [{'branch_id': 12110, 'acronym': 'GS', 'description_long': 'Grundschule'}], 'trades': []},...]
  • Related