Home > database >  Getting href links from a website using Python's Beautiful Soup module
Getting href links from a website using Python's Beautiful Soup module

Time:06-14

I am trying to get the href links from this devtools screenshot

Here, you can see the JSON response for all of the organizations that are being loaded into the page by client-side JS. If you take a look at the JSON, you'll notice that a link isn't one of the keys returned, but it's easily constructed using the WebsiteKey key.

Putting all of this together:

import requests
import json

SEARCH_URL = "https://illinois.campuslabs.com/engage/api/discovery/search/organizations"
ORGANIZATION_URL = "https://illinois.campuslabs.com/engage/organization/"

search = "badminton"
resp = requests.get(
    SEARCH_URL, 
    params={"top": 10, "filter": "", "query": search, "skip": 0}
)

organizations = json.loads(resp.text)["value"]

links = [ORGANIZATION_URL   organization["WebsiteKey"] for organization in organizations]

print(links)

Similar strategies can be used to find and use other API endpoints on the site, such as the organization categories.

  • Related