Home > Enterprise >  Read a gzip file and write into text file
Read a gzip file and write into text file

Time:10-06

How can I read a gzip file which has JSON content in it and then write that content into a text file.

with open('.../notebooks/decompressed.txt', 'wb') as f_out:
    with gzip.open(".../2020-04/statuses.log.2020-04-01-00.gz", 'rb') as f_in:
        data = f_in.read()
        json.dumps(data)

Error: Object of type bytes is not JSON serializable

decompressed.txt image(first 2 lines): enter image description here

CodePudding user response:

If log content is already json serialized format then just need to write decompressed data as-is.

import gzip
with gzip.open('.../2020-04/statuses.log.2020-04-01-00.gz', 'rb') as fin:
    with open('.../notebooks/decompressed.txt', 'wb') as fout:
        data = fin.read()
        fout.write(data)

If file is huge then import shutil module and replace read() and write() with:

shutil.copyfileobj(fin, fout)

If want to load JSON into a object and reserialize then:

import gzip
import json

with gzip.open('.../2020-04/statuses.log.2020-04-01-00.gz', 'rb') as fin:
    with open('.../notebooks/decompressed.txt', 'w') as fout:
       obj = json.load(fin)
       json.dump(obj, fout)

If the log file is a series of JSON structures one per line then try:

import gzip
with gzip.open('.../2020-04/statuses.log.2020-04-01-00.gz', 'rb') as fin:
    for line in fin:
        obj = json.loads(line)
        # next do something with obj

If JSON is too large to deserialize then try ijson to iterate over hugh JSON structures.

  • Related