Home > database >  Converting nested JSON to clear pd dataframe
Converting nested JSON to clear pd dataframe

Time:11-18

I've got a JSON file that I want to convert to a useful pd.DataFrame so that I can use this for further modelling. The JSON file looks like this:

json_file = {
  "x1": [
    {
      "a": "XZ12ABC1834",
      "b": "J. Doe",
      "c": [
        {
          "Amount": -50,
          "Date": "2021-08-15T10:00:00.000Z",
          "CategoryId": "abc123",
          "CounterParty": "The Farm",
          "Description": "some description",
          "Counter": "XYZ456AZ",
          "Type": "bc"
        },{
          "Amount": -1,
          "Date": "2020-08-15T10:00:00.000Z",
          "CategoryId": "cde123",
          "CounterParty": "The pool",
          "Description": "some other description",
          "Counter": "WYZ12",
          "Type": "X"
        }
         ]
      "a": "XX34XX872",
      "b": "J. Doe",
      "c": [
        {
          "Amount": -1,50,
          "Date": "2019-05-15T10:00:00.000Z",
          "CategoryId": "QWR627",
          "CounterParty": "The City",
          "Description": "last other description",
          "Counter": "QWE123",
          "Type": "S"
        }
      ]
    }
  ]
}

And I want to convert this JSON file to a dataframe that somewhat looks like this:

var1 a b amount date CategoryID Counterparty Description Counter Type
x1 XZ12ABC1834 J. Doe -50 2021-08-15T10:00:00.000Z abc123 The Farm some description XYZ456AZ bv
x1 XZ12ABC1834 J. Doe -1 2020-08-15T10:00:00.000Z cde123 The pool some other description WYZZ12 X
x1 XX34XX872 J. Doe -1.50 2019-05-15T10:00:00.000Z cde123 The city last other description QWE123 S

Hopefully this is enough info for someone to help me with this matter.

CodePudding user response:

I think that something like this should work:

import pandas as pd

result = []

for key in json_file:
  df_nested_list = pd.json_normalize(
    json_file[key], 
    record_path =['c'], 
    meta=['a', 'b']
  )
  df_nested_list['var1'] = key
  result.append(df_nested_list)
pd.concat(result)

For more information look at: https://towardsdatascience.com/how-to-convert-json-into-a-pandas-dataframe-100b2ae1e0d8

  • Related