Home > Mobile >  Can not get response from urllib.request.urlopen with an url ending with a dot
Can not get response from urllib.request.urlopen with an url ending with a dot

Time:05-11

I have a script looking like this, with a username ending with a "." dot.

import urllib.request

url = "https://likee.video/@evadecarle."
response = urllib.request.urlopen(url)
print(response)

The ending dot "." in the url seems to cause a problem. If I change the url to url = "https://likee.video/@11Happyness07.12" it works fine. How do I make it work with the ending dot "." ?

CodePudding user response:

If we try to fetch https://likee.video/@evadecarle. using urllib.requests, we see:

>>> import urllib.request
>>> response = urllib.request.urlopen('https://likee.video/@evadecarle.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib64/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib64/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/lib64/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib64/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 302: Moved Temporarily
>>>

It's failing because the remote website is returning a 302 status code (an http redirect). Normally, you would handle this by using an HTTPRedirectHandler, something like:

>>> opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler(), urllib.request.HTTPHandler(debuglevel=0))
>>> resp = opener.open('https://google.com')
>>> resp.url
'https://www.google.com/'

Unfortunately, the URL https://likee.video/@evadecarle. is an odd one: it returns a 302 status code, but doesn't include a Location: header identifying the redirect target.

Because of this, it looks like urllib doesn't handle it properly. Someone else may correct me on this, but it looks like the requests library handles this without a problem:

>>> resp = requests.get('https://likee.video/@evadecarle.')
>>> resp
<Response [302]>
>>> resp.text[:80]
'<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="robots" c'

So using the requests module may be the simplest solution.

  • Related