I'm starting to learn about reading data from a website. But when I try to read data from google.com I encounter this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 279: invalid continuation byte
Below are my code (extractly as the instruction video, only different website):
import urllib.request, urllib.parse, urllib.error
fhand=urllib.request.urlopen('https://www.google.com/')
for line in fhand:
print(line.decode().strip())
What is wrong? Thanks in advance
CodePudding user response:
Specifing the encoding and error handling should solve the problem:
import urllib.request, urllib.parse, urllib.error
fhand=urllib.request.urlopen('https://www.google.com/')
for line in fhand:
print(line.decode(encoding="utf-8", errors="backslashreplace").strip())
When you are learning web scraping with python you might wanna have a look at BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/