I am new to programming and i am trying to parse this page: https://ruz.spbstu.ru/faculty/100/groups
url = "https://ruz.spbstu.ru/faculty/100/groups"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
scripts = soup.find_all('script')
print(scripts[3].text)
this gives me
window.__INITIAL_STATE__ = {"faculties":{"isFetching":false,"data":null,"errors":null},"groups":{"isFetching":false,"data":{"100":[{"id":35754,"name":"3733806/00301","level":3,"type":"common","kind":0,"spec":"38.03.06 Торговое дело","year":2022},{"id":35715,"name":"3763801/10103","level":2,"type":"common","kind":3,"spec":"38.06.01 Экономика","year":2022},{"id":34725,"name":"з3753801/80430_2021","level":5,"type":"distance","kind":2,"spec":"38.05.01 Экономическая безопасность","year":2022},{"id":33632,"name":"3733801/10002_2021","level":2,"type":"common","kind":0,"spec":"38.03.01 Экономика","year":2022}...........
contents are very long so this is an extract from the output.
i need get all 'id's and 'name's from this output and put them into the dictionary like {id:name}, i can't figure out a way how to do it. Any information will be very helpful.
CodePudding user response:
Try:
import re
import json
import requests
from bs4 import BeautifulSoup
url = "https://ruz.spbstu.ru/faculty/100/groups"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
scripts = soup.find_all("script")
data = re.search(r".*?({.*});", scripts[3].text).group(1)
data = json.loads(data)
out = {d["id"]: d["name"] for d in data["groups"]["data"]["100"]}
print(out)
Prints:
{35754: '3733806/00301', 35715: '3763801/10103', ...etc.