Home > Software engineering >  How to normalise JSON data into seperate dataframe as rows and column in python
How to normalise JSON data into seperate dataframe as rows and column in python

Time:07-28

I normalised one big nested JSON file but still getting JSON data in column refer to below example. My JSON data is as follows.

    data = 
    [{
  "task" : "jump",
  "taskid" : "task-001",
  "task-req" : [ {
    "taskid" : "task-001",
    "requestID" : "task-001-r1",
    }, {
    "taskid" : "task-001",
    "requestID" : "task-001-r2",
    }, {
    "taskid" : "task-001",
    "requestID" : "task-001-r3",
    } ],
  "task-desc" : {
    "task-001-r1" : [{
      "taskid" : "task-001",
      "task" : "jump",
      "details" : "long jump",
    },
    "task-001-r2" : {
      "taskid" : "task-001",
      "task" : "jump",
      "details" : "short jump",
    },
    "task-001-r3" : {
      "taskid" : "task-001",
      "task" : "jump",
      "details" : "high jump",
    }]
  }
    }]

I want output like below:

task    taskid    requistID     details
jump    task-001  task-001-r1   long jump
jump    task-001  task-001-r2   short jump
jump    task-001  task-001-r3   high jump

I tried using json.normalise() but it does not convert into the format I need. It just adding a topmost column and puts JSON data as the value in the respective row. I am getting output as below using json.normalise()

task    taskid    task-req                                                  task-desc
jump    task-001  [{"taskid" : "task-002","requestID" : "task-002-r1"}  {"task-001-r1" : [{"taskid" : "task-001","task" : "jump","details" : "long jump"}]

do I have any way to do this?

CodePudding user response:

As for me main problem can be that it is not normal nested JSON. task-req doesn't have nested desc but it has separated item task-desc. So it can't do it using only json_normalize. It may needs extra code to convert dataframe to expected format.

But I think it can be much simpler if you use only for-loops (without json_normalize) to create normal list and later convert it to dataframe

Your example data is incorrect. It seems you forgot some [ ] in "task-desc" so I added it. If you have different data then you should show them correctly in question.

data = [
{
  "task" : "jump",
  "taskid" : "task-001",
  "task-req" : [
    {
      "taskid" : "task-001",
      "requestID" : "task-001-r1",
    },
    {
      "taskid" : "task-001",
      "requestID" : "task-001-r2",
    },
    {
      "taskid" : "task-001",
      "requestID" : "task-001-r3",
    }
  ],
  "task-desc" :
  {
    "task-001-r1" : [
      {
        "taskid" : "task-001",
        "task" : "jump",
        "details" : "long jump",
      },
    ],
    "task-001-r2" : [
      {
        "taskid" : "task-001",
        "task" : "jump",
        "details" : "short jump",
      },
    ],
    "task-001-r3" : [
      {
        "taskid" : "task-001",
        "task" : "jump",
        "details" : "high jump",
      }
    ]
  }
}
]

# --- 

print('--- rows ---')

rows = []

for item in data:
    task_name = item['task']
    task_id   = item['taskid'] 

    for req in item['task-req']:
        req_id = req['requestID']

        for req_desc in item['task-desc'][req_id]: 
            details = req_desc['details']
            rows.append( [name, task_id, req_id, details] )

            print(f'{name} | {task_id} | {req_id} | {details}')
  
# ---

import pandas as pd

df = pd.DataFrame(rows, columns=['task', 'taskid', 'requistID', 'details'])

print('--- dataframe ---')
print(df)

Result:

--- rows ---
jump | task-001 | task-001-r1 | long jump
jump | task-001 | task-001-r2 | short jump
jump | task-001 | task-001-r3 | high jump
--- dataframe ---
   task    taskid    requistID     details
0  jump  task-001  task-001-r1   long jump
1  jump  task-001  task-001-r2  short jump
2  jump  task-001  task-001-r3   high jump
  • Related