Home > Blockchain >  Python process large JSON file containing list of objects
Python process large JSON file containing list of objects

Time:12-11

I am parsing a large JSON file containing array of objects and writing the data in csv file in Python. The JSON file is 50GB in size, and I am getting Memory Error on line (data = json.load(data_file)) while loading the file.

The code runs successfully when I run with the file size of around 4GB and below. How do I resolve memory error when I run with file size of 50 GB or more?

JSON File Structure:

[
 {"name":"Haks",
  "age":"22",
  "other":{
           "weight":"100"
          }
 },
 {"name":"Kahs",
  "age":"38"
  "other":{
           "weight":"120"
          }
 },
 .....
]

Code:

import json 
import csv

with open('C:/Users/username/filename.json') as data_file
    data = json.load(data_file)

arr = []

for x in data:
    obj = []
    obj['name'] = x['name']
    obj['age'] = x['age']
    obj['weight']= x['other']['weight']
    arr.append(obj)

keys = arr[0].keys()
with open('json_output.csv', 'w',newline='') as csvfile:
    writer = csv.DictWriter(csvfile, keys)
    writer.writeheader()
    for item in arr:
        writer.writerow(item)

CodePudding user response:

You need a JSON parser that doesn't load all the data into RAM. Some examples of such libraries are ijson, yajl-py, bigjson.

  • Related