Home > Blockchain >  Python - Download zip files with requests package but get unknown file format
Python - Download zip files with requests package but get unknown file format

Time:12-31

I am using Python 3.8.12. I tried the following code to download files from URLs with the requests package, but got 'Unkown file format' message when opening the zip file. I tested on different zip URLs but the size of all zip files are 18KB and none of the files can be opened successfully.

import requests

file_url = 'https://www.censtatd.gov.
hk/en/EIndexbySubject.html?pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
file_download = requests.get(file_url, allow_redirects=True, stream=True)
open(save_path file_name, 'wb').write(file_download.content)

Zip file opening error message

Zip files size

However, once I updated the url as file_url = 'https://www.td.gov.hk/datagovhk_tis/mttd-csv/en/table41a_eng.csv' the code worked well and the csv file could be downloaded perfectly.

I try to use requests, urllib , wget and zipfile io packages, but none of them work.

The reason may be that the zip URL directs to both the zip file and a web page, while the csv URL directs to the csv file only.

I am really new to this field, could anyone help on it? Thanks a lot!

CodePudding user response:

You might examine headers after sending HEAD request to get information regarding file, examining Content-Type allows you to reveal actual type of file

import requests
file_url = 'https://www.censtatd.gov.hk/en/EIndexbySubject.html?pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
r = requests.head(file_url)
print(r.headers["Content-Type"])

gives output

text/html

So file you have URL to is actually HTML page.

CodePudding user response:

import wget

url = 'https://www.censtatd.gov.hk/en/EIndexbySubject.html? 
pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
#url = 'https://golang.org/dl/go1.17.3.windows-amd64.zip'
wget.download(url)
  • Related