Why I'm getting different responses when i use urllib.request.urlopen and requests.get
import requests
r = requests.get('https://upload.wikimedia.org/wikipedia/commons/1/14/Sunset_Boulevard_(1950_poster).jpg')
r.status_code
response 403
from urllib.request import urlopen
r = urlopen('https://upload.wikimedia.org/wikipedia/commons/1/14/Sunset_Boulevard_(1950_poster).jpg')
r.getcode()
response 200
CodePudding user response:
First you could check print( r.content )
to see what you get from server.
Usually you can get some explanation which can help to see problem.
For your code it shows problem with header User-Agent
Wikipedia: User-Agent policy
Some servers check header User-Agent
to send different content for different systems/browsers/devices. They use it also to detect scripts/bots/spamers/hackers and block them.
If I use header from real browser (or at least short Mozilla/5.0
) then it works.
import requests
headers = {'User-Agent': 'Mozilla/5.0'}
url = 'https://upload.wikimedia.org/wikipedia/commons/1/14/Sunset_Boulevard_(1950_poster).jpg'
#url = 'https://upload.wikimedia.org/wikipedia/commons/1/14/Sunset_Boulevard_(1950_poster).jpg'
r = requests.get(url, headers=headers)
print(r.status_code)
print(r.content[:100])
with open('image.jpg', 'wb') as fh:
fh.write(r.content)
EDIT:
After running code few times it start working for me even without User-Agent
. Maybe they checked it for some different reason.