I am trying to download the most recent zip file from the ERCOT Website (https://www.ercot.com/mp/data-products/compliance-and-disclosure/?id=NP3-965-ER). However, the link of the zip file has a doclookup id that changes everytime. The id is also populated dynamically. I have tried using beautifulsoup to get the link, but since it's being loaded dynamically it is not providing any links. Any feedback or solutions will be appreciated. enter image description here
CodePudding user response:
Using the exposed api:
import json
import pandas as pd
import pendulum
import requests
def get_document_id(type_id: int) -> int:
url = (
"https://www.ercot.com/misapp/servlets/IceDocListJsonWS?"
f"reportTypeId={type_id}&"
f"_={pendulum.now().format('X')}"
)
with requests.Session() as request:
response = request.get(url, timeout=10)
if response.status_code != 200:
print(response.raise_for_status())
data = json.loads(response.text)
return pd.json_normalize(data=data["ListDocsByRptTypeRes"], record_path="DocumentList").head(1)["Document.DocID"].squeeze()
id_number = get_document_id(13052)
print(id_number)
869234127