Home > Software design >  Python3: Decoding an 'application/x-gzip' response with Requests
Python3: Decoding an 'application/x-gzip' response with Requests

Time:11-25

I'm using requests to download data from a website. The content is an EPG XML file packed in a compressed gz file. I've been googling and trying all night with no success.

This is the relevant snip of my current stage. I'v tried to change the encoding to UTF-8 and ISO-8859-1, but it just gives me a different kind of nonsense.

import xml.etree.ElementTree as ET
import requests
import gzip

url = 'http://example.com'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
    'AppleWebKit/537.36 (KHTML, like Gecko) '
    'Chrome/102.0.0.0 Safari/537.36'
}


def import_xml() -> list:
    try:
        response = requests.get(url, headers=headers, stream=True)
        print(response)
        print(response.headers['Content-Type'])
        print(response.encoding)
        print(response.text[:100])
        print(response.content[:100])

data = import_xml

This outputs the following:

<Response [200]>
application/x-gzip
None
      ��v�J�-�~�8��^��%�u�vuI�/�����z*��HX ���o�?��t����;s��I� ,[�K{��e�@�̌��1#�����4   
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x0b\xec\xbd\xdbv\xdbJ\x92-\xfa~\xc68\xff\x80^\x0f\xed\xee\xb1%\x97u\xf1\xbavu\x0fI\xbe/\xcb\xf6\xb6\xb4\xec\xaaz*\x90\x84HX \xc0\x02\x08\xc9\xf4o\xec?\xd8\xfdt\x1e\xce\xf9\x88\xae\x1f;s\xce\xc8\x04I\x98\x00\x05,[\x1e\xbbK{\x8f\xaee\x9b@\x02\x88\xcc\x8c\x8c\x981#\xe2\xdf'

CodePudding user response:

Generally gzipped content is served as application/gzip. It seems requests doesn't know what to do with application/x-gzip, so you will have to decode it manually.

import gzip

result = gzip.decompress(response.content)
  • Related