I am parsing a large JSON file containing array of objects and writing the data in csv file in Python. The JSON file is 50GB in size, and I am getting Memory Error on line (data = json.load(data_file)) while loading the file.
The code runs successfully when I run with the file size of around 4GB and below. How do I resolve memory error when I run with file size of 50 GB or more?
JSON File Structure:
[
{"name":"Haks",
"age":"22",
"other":{
"weight":"100"
}
},
{"name":"Kahs",
"age":"38"
"other":{
"weight":"120"
}
},
.....
]
Code:
import json
import csv
with open('C:/Users/username/filename.json') as data_file
data = json.load(data_file)
arr = []
for x in data:
obj = []
obj['name'] = x['name']
obj['age'] = x['age']
obj['weight']= x['other']['weight']
arr.append(obj)
keys = arr[0].keys()
with open('json_output.csv', 'w',newline='') as csvfile:
writer = csv.DictWriter(csvfile, keys)
writer.writeheader()
for item in arr:
writer.writerow(item)
CodePudding user response:
You need a JSON parser that doesn't load all the data into RAM. Some examples of such libraries are ijson, yajl-py, bigjson.