Home > OS >  Unable to download file from URL using python
Unable to download file from URL using python

Time:12-15

I am trying to download the file from the URL:

https://www.cmegroup.com/content/dam/cmegroup/notices/clearing/2020/08/Chadv20-239.pdf

I tried using the python requests library, but the request just timed out. I tried specifying the 'User-Agent' from my browser as a header, but it still just timed out, including when I copied across every single header from my browser into my python script. I tried setting allow_redirects=True, this did not help. I've also tried wget and curl, everything fails apart from actually opening the browser, visiting the URL and downloading the file.

I'm wondering what the actual difference is between the requests in my browser and the python requests where I set the headers to match those in my browser - is there any way I can download this file using python?

Code snippet:

import requests
requests.get("https://www.cmegroup.com/content/dam/cmegroup/notices/clearing/2020/08/Chadv20-239.pdf") # hangs

CodePudding user response:

Check this, It's worked for me.

import requests
headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'}
response = requests.get(
    "https://www.cmegroup.com/content/dam/cmegroup/notices/clearing/2020/08/Chadv20-239.pdf", headers=headers)
pdf = open("Chadv20-239.pdf", 'wb')
pdf.write(response.content)
pdf.close()

CodePudding user response:

It is difficult to understand what might be going wrong without some code snippet. How is the file being downloaded? Are you getting raw response content and saving that as pdf? The official docs(https://docs.python-requests.org/en/latest/user/quickstart/#raw-response-content) suggest using chunk based approach to save the streamed/raw content. Did you try that approach?

  • Related