I want to parse this webpage which contains symbols and Cyrillic letters. GET request return response with wrong encoding, which can not show these letters. What you recommend to prevent this response
import requests
url = "http://www.cawater-info.net/karadarya/1991/veg1991.htm"
response = requests.get(url)
print(response.encoding)
print(response.text[:100])
I tried to encode this text, but it did not help
print(response.text.encode('utf-8')[:100])
print(response.text.encode('cp852')[:100])
CodePudding user response:
Since the response contains some cyrillic alphabet, you need cp1251 to decode the content :
print(response.content.decode("cp1251")[:100]) # or windows-1251
#<HTML><HEAD><TITLE>Оперативные данные по водозаборам бассейна реки Карадарья на период вегетации 199