How can I download a file in Python whose link is hidden behind a Javascript function?-CodePudding

I am trying to download a CSV file provided by the Download to CSV link here:

CodePudding user response：

You can try web scraping the link using Beautiful Soup or Scrapy, then use the link to download the csv. What libraries are you using currently?

CodePudding user response：

If you check with developer tools in browser, you can see clicking on the button to generate link uses a POST request to the link https://www.misoenergy.org/findapi/MisoWeb_Models_Find_RemoteHostedContentItem/_search

In python, to do the POST request, you need to use

import requests

response = requests.post('url_address', payload=payload)

(Note you may have to fiddle with the headers/cookies to get it to work)

The payload is

{"partial_fields":{"source":{"exclude":["documentText"]}},"script_fields":{"documentTextExcerpt":{"params":{"field":"documentText","length":150},"script":"ascropped"}},"sort":[{"Updated":"desc"},{"Name":"asc"}],"query":{"filtered":{"filter":{"and":[{"query":{"term":{"ProjectNumber":"J1000"}}}]}}},"size":1000}

Essentially, this payload uses the project number from the <a> tag to search:

<a href="#" data-projectnumber="J1000" data-studygroup="East (ATC)" data-studycycle="DPP-2018-APR"><i  aria-hidden="true"></i></a>

Then in the returned response json you have

response.json()['hits']['hits'][0]['fields']['source']['SearchHitUrl$$string'] is the link to the document