I am trying to download an image from Wikipedia and save it to a file locally (using Python 3.9.x). Following this link I tried:
import urllib.request
http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')
However, when I try to open this file (Mac OS) I get an error: The file “test.jpg” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.
I did some more search and came across this article which suggests modifying the User-Agent. Following that I modified the above code as follows:
import urllib.request
opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0')]
urllib.request.install_opener(opener)
http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')
However, modifying the User-Agent did NOT help and I still get the same error while trying to open the file: The file “test.jpg” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.
Another piece of information: the downloaded file (that does not open) is 235 KB. But if I download the image manually (Right Click -> Save Image As...) it is 455 KB.
I was wondering what else am I missing? Thank you!
CodePudding user response:
The problem is, you're trying to download the web page with the .jpg format. This link you used is actually not a photo link, but a Web site contains a photograph. That's why the photo size is 455KB and the size of the file you're downloading is 235KB.
Instead of this :
http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')
Use this :
http = 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Abacus_4.jpg/800px-Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')
It is better to open any photo you want to use first with the open image in new tab option in your browser and then copy the url.