I'm using requests to download data from a website. The content is an EPG XML file packed in a compressed gz file. I've been googling and trying all night with no success.
This is the relevant snip of my current stage. I'v tried to change the encoding to UTF-8 and ISO-8859-1, but it just gives me a different kind of nonsense.
import xml.etree.ElementTree as ET
import requests
import gzip
url = 'http://example.com'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/102.0.0.0 Safari/537.36'
}
def import_xml() -> list:
try:
response = requests.get(url, headers=headers, stream=True)
print(response)
print(response.headers['Content-Type'])
print(response.encoding)
print(response.text[:100])
print(response.content[:100])
data = import_xml
This outputs the following:
<Response [200]>
application/x-gzip
None
��v�J�-�~�8��^��%�u�vuI�/�����z*��HX ���o�?��t����;s��I� ,[�K{��e�@�̌��1#�����4
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x0b\xec\xbd\xdbv\xdbJ\x92-\xfa~\xc68\xff\x80^\x0f\xed\xee\xb1%\x97u\xf1\xbavu\x0fI\xbe/\xcb\xf6\xb6\xb4\xec\xaaz*\x90\x84HX \xc0\x02\x08\xc9\xf4o\xec?\xd8\xfdt\x1e\xce\xf9\x88\xae\x1f;s\xce\xc8\x04I\x98\x00\x05,[\x1e\xbbK{\x8f\xaee\x9b@\x02\x88\xcc\x8c\x8c\x981#\xe2\xdf'
CodePudding user response:
Generally gzipped content is served as application/gzip
. It seems requests doesn't know what to do with application/x-gzip
, so you will have to decode it manually.
import gzip
result = gzip.decompress(response.content)