How can I read a gzip file which has JSON content in it and then write that content into a text file.
with open('.../notebooks/decompressed.txt', 'wb') as f_out:
with gzip.open(".../2020-04/statuses.log.2020-04-01-00.gz", 'rb') as f_in:
data = f_in.read()
json.dumps(data)
Error: Object of type bytes is not JSON serializable
decompressed.txt image(first 2 lines): enter image description here
CodePudding user response:
If log content is already json serialized format then just need to write decompressed data as-is.
import gzip
with gzip.open('.../2020-04/statuses.log.2020-04-01-00.gz', 'rb') as fin:
with open('.../notebooks/decompressed.txt', 'wb') as fout:
data = fin.read()
fout.write(data)
If file is huge then import shutil module and replace read() and write() with:
shutil.copyfileobj(fin, fout)
If want to load JSON into a object and reserialize then:
import gzip
import json
with gzip.open('.../2020-04/statuses.log.2020-04-01-00.gz', 'rb') as fin:
with open('.../notebooks/decompressed.txt', 'w') as fout:
obj = json.load(fin)
json.dump(obj, fout)
If the log file is a series of JSON structures one per line then try:
import gzip
with gzip.open('.../2020-04/statuses.log.2020-04-01-00.gz', 'rb') as fin:
for line in fin:
obj = json.loads(line)
# next do something with obj
If JSON is too large to deserialize then try ijson to iterate over hugh JSON structures.