Some Image URLs working while most dont Pillow-CodePudding

I am trying to make some image filters for my API. Some URLs work while most do not. I wanted to know why and how to fix it. I have Looked through another stack overflow post but have not had much luck as I don't know the problem.

Here is my code

def generate_image_Wanted(imageUrl):

    with urllib.request.urlopen(imageUrl) as url:
        f = io.BytesIO(url.read())

    im1 = Image.open("images/wanted.jpg")
    im2 = Image.open(f)
    im2 = im2.resize((300, 285))

    img = im1.copy()
    img.paste(im2, (85, 230))
    d = BytesIO()
    d.seek(0)
    img.save(d, "PNG")
    d.seek(0)
    return d

Here is my error

Traceback (most recent call last):
  File "c:\Users\micha\OneDrive\Desktop\MicsAPI\test.py", line 23, in <module>
    generate_image_Wanted("https://cdn.discordapp.com/avatars/902240397273743361/9d7ce93e7510f47da2d8ba97ec32fc33.png")
  File "c:\Users\micha\OneDrive\Desktop\MicsAPI\test.py", line 11, in generate_image_Wanted
    with urllib.request.urlopen(imageUrl) as url:
  File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 523, in open
    response = meth(req, response)
  File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 632, in http_response
    response = self.parent.error(
  File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 561, in error
    return self._call_chain(*args)
  File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
  File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Thank you for looking at this and have a good day.

CodePudding user response：

maybe sites you can't scrape has server prevention for known bot and spiders and block your request from urllib.

You need to provide some headers - see more about python request lib

Working example:

import urllib.request
hdr = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' }
url = "https://cdn.discordapp.com/avatars/902240397273743361/9d7ce93e7510f47da2d8ba97ec32fc33.png"

req = urllib.request.Request(url, headers=hdr)
response = urllib.request.urlopen(req)
response.read()