Downloading an image from the web and saving-CodePudding

I am trying to download an image from Wikipedia and save it to a file locally (using Python 3.9.x). Following this link I tried:

import urllib.request

http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')

However, when I try to open this file (Mac OS) I get an error: The file “test.jpg” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.

I did some more search and came across this article which suggests modifying the User-Agent. Following that I modified the above code as follows:

import urllib.request

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0')]
urllib.request.install_opener(opener)

http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')

However, modifying the User-Agent did NOT help and I still get the same error while trying to open the file: The file “test.jpg” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.

Another piece of information: the downloaded file (that does not open) is 235 KB. But if I download the image manually (Right Click -> Save Image As...) it is 455 KB.

I was wondering what else am I missing? Thank you!

CodePudding user response：

The problem is, you're trying to download the web page with the .jpg format. This link you used is actually not a photo link, but a Web site contains a photograph. That's why the photo size is 455KB and the size of the file you're downloading is 235KB.

Instead of this :

http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')

Use this :

http = 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Abacus_4.jpg/800px-Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')

It is better to open any photo you want to use first with the open image in new tab option in your browser and then copy the url.