I am trying to read json data and writing to csv file but not working. How to approach for it to get my expected output
Code :
import json
import csv
import ast
# Using json loads to load data from varaible which holds json data
jdata = json.loads(data)
# Fetching all headers from json and converting into list
jkey = list(jdata[0].keys())
# get length
glen_count = len(data)
# Looping through the keys to write data to csv file
for r in range(glen_count):
for hd in jkey:
with open("/content/sample_data/abc.csv",'w') as wr:
csv_wr = csv.DictWriter(wr,fieldnames=jkey)
csv_wr.writeheader()
csv_wr.writerows(jdata[jdata[r][hd]]) # error : Type error : list indices must be integers or slices, not dict
My variable data
contain json
data = '''
[
{
"_id": {
"$oid": "5fc7897969b8ab14e1cad8f4"
},
"m": "24:6F:28:55:7C:55",
"t": "o",
"pId": {
"$oid": "5fc7894ff7fa2b262ecb6c46"
},
"pVal": {
"title": "Switch",
"val": 1,
"status": null,
"portType": "AC",
"portIcon": "AC",
"_id": "5fc7894ff7fa2b262ecb6c46",
"no": "0",
"cat": "s",
"device": {
"ports": [
"5fc7894ff7fa2b262ecb6c46"
],
"_id": "5fc7894ff7fa2b262ecb6c45",
"macId": "24:6F:28:55:7C:55",
"deviceId": "LTAC21220050009",
"dType": "5ee38d3a4afdf74cf8b07dfb",
"createdAt": "2020-12-02T12:32:15.235Z",
"updatedAt": "2020-12-02T12:32:36.773Z",
"__v": 0,
"assignedTo": "5fc62223f7fa2b262ecb6c2d"
},
"mac": "24:6F:28:55:7C:55",
"__v": 0,
"createdAt": "2020-12-02T12:32:15.237Z",
"updatedAt": "2020-12-02T12:32:57.776Z"
},
"dId": {
"$oid": "5fc7894ff7fa2b262ecb6c45"
},
"createdAt": {
"$date": "2020-12-02T12:32:57.781Z"
},
"updatedAt": {
"$date": "2020-12-02T12:32:57.781Z"
},
"__v": 0
},{
"_id": {
"$oid": "5fc7897b69b8ab14e1cad8f5"
},
"m": "24:6F:28:55:7C:55",
"t": "o",
"pId": {
"$oid": "5fc7894ff7fa2b262ecb6c46"
},
"pVal": {
"title": "Switch",
"val": 0,
"status": null,
"portType": "AC",
"portIcon": "AC",
"_id": "5fc7894ff7fa2b262ecb6c46",
"no": "0",
"cat": "s",
"device": {
"ports": [
"5fc7894ff7fa2b262ecb6c46"
],
"_id": "5fc7894ff7fa2b262ecb6c45",
"macId": "24:6F:28:55:7C:55",
"deviceId": "LTAC21220050009",
"dType": "5ee38d3a4afdf74cf8b07dfb",
"createdAt": "2020-12-02T12:32:15.235Z",
"updatedAt": "2020-12-02T12:32:36.773Z",
"__v": 0,
"assignedTo": "5fc62223f7fa2b262ecb6c2d"
},
"mac": "24:6F:28:55:7C:55",
"__v": 0,
"createdAt": "2020-12-02T12:32:15.237Z",
"updatedAt": "2020-12-02T12:32:59.951Z"
},
"dId": {
"$oid": "5fc7894ff7fa2b262ecb6c45"
},
"createdAt": {
"$date": "2020-12-02T12:32:59.955Z"
},
"updatedAt": {
"$date": "2020-12-02T12:32:59.955Z"
},
"__v": 0
}
]
Expected output is need to write to csv file : column : id , val , createdAt , updatedAt
id , val , createdAt , updatedAt
5fc7897969b8ab14e1cad8f4 , 0 , 2020-12-02T12:32:57.781Z , 2020-12-02T12:32:57.781Z
5fc7897b69b8ab14e1cad8f5 , 0 , 2020-12-02T12:32:59.955Z , 2020-12-02T12:32:59.955Z
CodePudding user response:
Okay, a few things:
jkey = list(jdata[0].keys())
will give youjkeys = ['_id', 'm', 't', 'pId', 'pVal', 'dId', 'createdAt', 'updatedAt', '__v']
, which is more than your desired keys. If your goal is to select onlyid , val , createdAt , updatedAt
,jkeys
must reflect that asjkeys = ['id' , 'val' , 'createdAt' , 'updatedAt']
. It's the value you're passing to the fieldnames, after all. Those are the header names, and it identifies the order in which the values will be written.glen_count = len(data)
is likely a typo and you meant to writejdata
, data is the string containing the json data, so you would be counting the length of this string.- The error you're going through is because the return of the expression
jdata[jdata[r][hd]]
is{'$oid': '5fc7897969b8ab14e1cad8f4'}
, a dict. You are using this one as a index for the listjdata
. - About
DictWriter
andwriterows
: You pass to its constructor specific fieldnames, in your case, it'sjkeys = ['id' , 'val' , 'createdAt' , 'updatedAt']
. This is your desired header. You must then either use a list of dictionaries withwriterows
or a single dictionary withwriterow
.Writerows
will have you write all instances at once, andwriterow
is a single instance at a time. See CSV File Reading and Writing as a reference. - Careful with looping through
with open("/content/sample_data/abc.csv",'w') as wr:
, you'll be rewriting your own file! See Reading and Writing Files to learn about the writing modes!
I will post the following as a solution, using writerows
to separate the logic of writing to a file and the logic of setting up the rows. Hopefully those are the headers desired. If not, it's easy to change according to your own needs.
import json
import csv
import ast
jdata = json.loads(data)
# Only desired headers
jkeys = ['id' , 'val' , 'createdAt' , 'updatedAt']
# setting up rows
list_rows = []
for el in jdata:
dict_jdata = {}
dict_jdata['id'] = el['_id']['$oid']
dict_jdata['val'] = str(el['pVal']['val'])
dict_jdata['createdAt'] = el['createdAt']['$date']
dict_jdata['updatedAt'] = el['updatedAt']["$date"]
list_rows.append(dict_jdata)
# Writing to a file
with open("/content/sample_data/abc2.csv",'w') as wr:
csv_wr = csv.DictWriter(wr,fieldnames=jkeys)
csv_wr.writeheader()
csv_wr.writerows(list_rows)
Hopefully this helped :)