Home > OS >  Find non placholder image when webscrpaing Python
Find non placholder image when webscrpaing Python

Time:10-15

I want to get an image from a website, but the website loads a placeholder image before the image I want. I need the image for the logic of the program.

This is the code:

import requests
from bs4 import BeautifulSoup


def main():
    r = requests.get("https://www.simcoecountyschoolbus.ca/")
    soup = BeautifulSoup(r.content, "html.parser")

    northdiv= soup.find("div", id="status-icon-north")
    northimages = northdiv.select('img')
    statusNorth = northimages[0].get("src")

    westdiv = soup.find("div", id="status-icon-west")
    print(westdiv)
    statusWest = westdiv.select("img")[0].get("src")

    print(statusWest)

main()

I want to get the image "images/status-none.png" but it returns "images/status-some.png"

CodePudding user response:

It looks like javascript is loading that data after the initial page load. You'd be better off getting the json data that is loaded from the backend request to this endpoint: https://www.simcoecountyschoolbus.ca/status.json

To find this you can open your browsers Developer Tools, then click the Network - fetch/XHR button and refresh the page... here you will see the backend api requests that load data after the initial page load. If you click on the one that says "status" you'll see the endpoint url, as well as the response which you can inspect for the data you want.

  • Related