Home > Back-end >  JSON to CSV: keys to header issue
JSON to CSV: keys to header issue

Time:11-03

I am trying to convert a very long JSON file to CSV. I'm currently trying to use the code below to accomplish this.

import json
import csv
 
with open('G:\user\jsondata.json') as json_file:
    jsondata = json.load(json_file)
 
data_file = open('G:\user\jsonoutput.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
 
count = 0
for data in jsondata:
    if count == 0:
        header = data.keys()
        csv_writer.writerow(header)
        count  = 1
    csv_writer.writerow(data.values())
 
data_file.close()

This code accomplishes writing all the data to a CSV, However only takes the keys for from the first JSON line to use as the headers in the CSV. This would be fine, but further in the JSON there are more keys to used. This causes the values to be disorganized. I was wondering if anyone could help me find a way to get all the possible headers and possibly insert NA when a line doesn't contain that key or values for that key.

The JSON file is similar to this:


[

    {"time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "action": "TAKEN", "Country": "Germany", "Purchased": "YES", ...},
    {"time": "1984-10-04:4:00", "dateOfevent": "1984-10-04", "action": "NOTTAKEN", "Country": "Germany", "Purchased": "NO", ...},
    {"type": "A4", "time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "Country": "Germany", "typeOfevent": "H7", ...},
    {...},
    {...},

]

I've searched for possible solutions all over, but was unable to find anyone having a similar issue.

CodePudding user response:

Could you try to read it normally, and then cobert it to csv using .to_csv like this:

df = pd.read_json('G:\user\jsondata')
#df = pd.json_normalize(df['Column Name']) #if you want to normalize it


dv.to_csv('example.csv')

CodePudding user response:

If want to use csv and json modules to do this then can do it in two passes. First pass collects the keys for the CSV file and second pass writes the rows to CSV file. Also, must use a DictWriter since the keys differ in the different records.

import json
import csv

with open('jsondata.json') as json_file:
    jsondata = json.load(json_file)

keys = []
for data in jsondata:
    for k in data.keys():
        if k not in keys:
            keys.append(k)

with open('jsonoutput.csv', 'w', newline='') as fout:
    csv_writer = csv.DictWriter(fout, fieldnames=keys)
    csv_writer.writeheader()
    for data in jsondata:
        csv_writer.writerow(data)
  • Related