Home > Software design >  Why I'm getting different responses when i use urllib.request.urlopen and requests.get
Why I'm getting different responses when i use urllib.request.urlopen and requests.get

Time:01-02

Why I'm getting different responses when i use urllib.request.urlopen and requests.get

import requests
r = requests.get('https://upload.wikimedia.org/wikipedia/commons/1/14/Sunset_Boulevard_(1950_poster).jpg')
r.status_code

response 403

from urllib.request import urlopen
r = urlopen('https://upload.wikimedia.org/wikipedia/commons/1/14/Sunset_Boulevard_(1950_poster).jpg')
r.getcode()

response 200

CodePudding user response:

First you could check print( r.content ) to see what you get from server.
Usually you can get some explanation which can help to see problem.


For your code it shows problem with header User-Agent

Wikipedia: User-Agent policy

Some servers check header User-Agent to send different content for different systems/browsers/devices. They use it also to detect scripts/bots/spamers/hackers and block them.

If I use header from real browser (or at least short Mozilla/5.0) then it works.

import requests

headers = {'User-Agent': 'Mozilla/5.0'}

url = 'https://upload.wikimedia.org/wikipedia/commons/1/14/Sunset_Boulevard_(1950_poster).jpg'
#url = 'https://upload.wikimedia.org/wikipedia/commons/1/14/Sunset_Boulevard_(1950_poster).jpg'

r = requests.get(url, headers=headers)

print(r.status_code)
print(r.content[:100])

with open('image.jpg', 'wb') as fh:
    fh.write(r.content)

EDIT:

After running code few times it start working for me even without User-Agent. Maybe they checked it for some different reason.

  • Related