I am trying to download a CSV file provided by the Download to CSV link here:
CodePudding user response:
You can try web scraping the link using Beautiful Soup or Scrapy, then use the link to download the csv. What libraries are you using currently?
CodePudding user response:
If you check with developer tools in browser, you can see clicking on the button to generate link uses a POST
request to the link https://www.misoenergy.org/findapi/MisoWeb_Models_Find_RemoteHostedContentItem/_search
In python, to do the POST
request, you need to use
import requests
response = requests.post('url_address', payload=payload)
(Note you may have to fiddle with the headers/cookies to get it to work)
The payload is
{"partial_fields":{"source":{"exclude":["documentText"]}},"script_fields":{"documentTextExcerpt":{"params":{"field":"documentText","length":150},"script":"ascropped"}},"sort":[{"Updated":"desc"},{"Name":"asc"}],"query":{"filtered":{"filter":{"and":[{"query":{"term":{"ProjectNumber":"J1000"}}}]}}},"size":1000}
Essentially, this payload uses the project number from the <a>
tag to search:
<a href="#" data-projectnumber="J1000" data-studygroup="East (ATC)" data-studycycle="DPP-2018-APR"><i aria-hidden="true"></i></a>
Then in the returned response json you have
response.json()['hits']['hits'][0]['fields']['source']['SearchHitUrl$$string'] is the link to the document