Home > OS >  How can I download a file in Python whose link is hidden behind a Javascript function?
How can I download a file in Python whose link is hidden behind a Javascript function?

Time:10-27

I am trying to download a CSV file provided by the Download to CSV link here: enter image description here

CodePudding user response:

You can try web scraping the link using Beautiful Soup or Scrapy, then use the link to download the csv. What libraries are you using currently?

CodePudding user response:

If you check with developer tools in browser, you can see clicking on the button to generate link uses a POST request to the link https://www.misoenergy.org/findapi/MisoWeb_Models_Find_RemoteHostedContentItem/_search

In python, to do the POST request, you need to use

import requests

response = requests.post('url_address', payload=payload)

(Note you may have to fiddle with the headers/cookies to get it to work)

The payload is

{"partial_fields":{"source":{"exclude":["documentText"]}},"script_fields":{"documentTextExcerpt":{"params":{"field":"documentText","length":150},"script":"ascropped"}},"sort":[{"Updated":"desc"},{"Name":"asc"}],"query":{"filtered":{"filter":{"and":[{"query":{"term":{"ProjectNumber":"J1000"}}}]}}},"size":1000}

Essentially, this payload uses the project number from the <a> tag to search:

<a href="#" data-projectnumber="J1000" data-studygroup="East (ATC)" data-studycycle="DPP-2018-APR"><i  aria-hidden="true"></i></a>

Then in the returned response json you have

response.json()['hits']['hits'][0]['fields']['source']['SearchHitUrl$$string'] is the link to the document
  • Related