Home > Software design >  Get data from bs4 script
Get data from bs4 script

Time:11-03

I am new to programming and i am trying to parse this page: https://ruz.spbstu.ru/faculty/100/groups

url = "https://ruz.spbstu.ru/faculty/100/groups"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
scripts = soup.find_all('script')
print(scripts[3].text)

this gives me window.__INITIAL_STATE__ = {"faculties":{"isFetching":false,"data":null,"errors":null},"groups":{"isFetching":false,"data":{"100":[{"id":35754,"name":"3733806/00301","level":3,"type":"common","kind":0,"spec":"38.03.06 Торговое дело","year":2022},{"id":35715,"name":"3763801/10103","level":2,"type":"common","kind":3,"spec":"38.06.01 Экономика","year":2022},{"id":34725,"name":"з3753801/80430_2021","level":5,"type":"distance","kind":2,"spec":"38.05.01 Экономическая безопасность","year":2022},{"id":33632,"name":"3733801/10002_2021","level":2,"type":"common","kind":0,"spec":"38.03.01 Экономика","year":2022}........... contents are very long so this is an extract from the output.

i need get all 'id's and 'name's from this output and put them into the dictionary like {id:name}, i can't figure out a way how to do it. Any information will be very helpful.

CodePudding user response:

Try:

import re
import json
import requests
from bs4 import BeautifulSoup

url = "https://ruz.spbstu.ru/faculty/100/groups"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
scripts = soup.find_all("script")

data = re.search(r".*?({.*});", scripts[3].text).group(1)
data = json.loads(data)

out = {d["id"]: d["name"] for d in data["groups"]["data"]["100"]}
print(out)

Prints:

{35754: '3733806/00301', 35715: '3763801/10103', ...etc.
  • Related