I have a script looking like this, with a username ending with a "." dot.
import urllib.request
url = "https://likee.video/@evadecarle."
response = urllib.request.urlopen(url)
print(response)
The ending dot "." in the url seems to cause a problem.
If I change the url to url = "https://likee.video/@11Happyness07.12"
it works fine.
How do I make it work with the ending dot "." ?
CodePudding user response:
If we try to fetch https://likee.video/@evadecarle.
using urllib.requests
, we see:
>>> import urllib.request
>>> response = urllib.request.urlopen('https://likee.video/@evadecarle.')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python3.10/urllib/request.py", line 525, in open
response = meth(req, response)
File "/usr/lib64/python3.10/urllib/request.py", line 634, in http_response
response = self.parent.error(
File "/usr/lib64/python3.10/urllib/request.py", line 563, in error
return self._call_chain(*args)
File "/usr/lib64/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/usr/lib64/python3.10/urllib/request.py", line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 302: Moved Temporarily
>>>
It's failing because the remote website is returning a 302 status code
(an http redirect). Normally, you would handle this by using an
HTTPRedirectHandler
, something like:
>>> opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler(), urllib.request.HTTPHandler(debuglevel=0))
>>> resp = opener.open('https://google.com')
>>> resp.url
'https://www.google.com/'
Unfortunately, the URL https://likee.video/@evadecarle.
is an odd
one: it returns a 302
status code, but doesn't include a Location:
header identifying the redirect target.
Because of this, it looks like urllib
doesn't handle it properly.
Someone else may correct me on this, but it looks like the requests
library handles this without a problem:
>>> resp = requests.get('https://likee.video/@evadecarle.')
>>> resp
<Response [302]>
>>> resp.text[:80]
'<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="robots" c'
So using the requests
module may be the simplest solution.