I normalised one big nested JSON file but still getting JSON data in column refer to below example. My JSON data is as follows.
data =
[{
"task" : "jump",
"taskid" : "task-001",
"task-req" : [ {
"taskid" : "task-001",
"requestID" : "task-001-r1",
}, {
"taskid" : "task-001",
"requestID" : "task-001-r2",
}, {
"taskid" : "task-001",
"requestID" : "task-001-r3",
} ],
"task-desc" : {
"task-001-r1" : [{
"taskid" : "task-001",
"task" : "jump",
"details" : "long jump",
},
"task-001-r2" : {
"taskid" : "task-001",
"task" : "jump",
"details" : "short jump",
},
"task-001-r3" : {
"taskid" : "task-001",
"task" : "jump",
"details" : "high jump",
}]
}
}]
I want output like below:
task taskid requistID details
jump task-001 task-001-r1 long jump
jump task-001 task-001-r2 short jump
jump task-001 task-001-r3 high jump
I tried using json.normalise()
but it does not convert into the format I need. It just adding a topmost column and puts JSON data as the value in the respective row.
I am getting output as below using json.normalise()
task taskid task-req task-desc
jump task-001 [{"taskid" : "task-002","requestID" : "task-002-r1"} {"task-001-r1" : [{"taskid" : "task-001","task" : "jump","details" : "long jump"}]
do I have any way to do this?
CodePudding user response:
As for me main problem can be that it is not normal nested JSON. task-req
doesn't have nested desc
but it has separated item task-desc
. So it can't do it using only json_normalize
. It may needs extra code to convert dataframe to expected format.
But I think it can be much simpler if you use only for
-loops (without json_normalize
) to create normal list
and later convert it to dataframe
Your example data is incorrect. It seems you forgot some [ ]
in "task-desc"
so I added it. If you have different data then you should show them correctly in question.
data = [
{
"task" : "jump",
"taskid" : "task-001",
"task-req" : [
{
"taskid" : "task-001",
"requestID" : "task-001-r1",
},
{
"taskid" : "task-001",
"requestID" : "task-001-r2",
},
{
"taskid" : "task-001",
"requestID" : "task-001-r3",
}
],
"task-desc" :
{
"task-001-r1" : [
{
"taskid" : "task-001",
"task" : "jump",
"details" : "long jump",
},
],
"task-001-r2" : [
{
"taskid" : "task-001",
"task" : "jump",
"details" : "short jump",
},
],
"task-001-r3" : [
{
"taskid" : "task-001",
"task" : "jump",
"details" : "high jump",
}
]
}
}
]
# ---
print('--- rows ---')
rows = []
for item in data:
task_name = item['task']
task_id = item['taskid']
for req in item['task-req']:
req_id = req['requestID']
for req_desc in item['task-desc'][req_id]:
details = req_desc['details']
rows.append( [name, task_id, req_id, details] )
print(f'{name} | {task_id} | {req_id} | {details}')
# ---
import pandas as pd
df = pd.DataFrame(rows, columns=['task', 'taskid', 'requistID', 'details'])
print('--- dataframe ---')
print(df)
Result:
--- rows ---
jump | task-001 | task-001-r1 | long jump
jump | task-001 | task-001-r2 | short jump
jump | task-001 | task-001-r3 | high jump
--- dataframe ---
task taskid requistID details
0 jump task-001 task-001-r1 long jump
1 jump task-001 task-001-r2 short jump
2 jump task-001 task-001-r3 high jump