Home > Mobile >  How to convert a nested JSON into ordered columns without using pandas
How to convert a nested JSON into ordered columns without using pandas

Time:07-06

I have spent days trying to solve this problem. This was the most effective way I could find how to iterate this object: I get the result, but when I use pandas the DataFrame structure is not correct. So I was exploring the idea of ​​creating the columns without pandas, but I don't get the expected result.

Any idea how I could improve this code?

Sample data:

{
  "financeCodeId": "fc-5599",
  "financeData": [
    {
      "date": "2022-01-30",
      "calcTotalReturn": 0.022425456852
    },
    {
      "date": "2022-01-30",
      "calcTotalReturn": 0.022425456852
    },
    {
      "date": "2022-01-30",
      "calcTotalReturn": 0.022425456852
    },
    {
      "date": "2022-01-30",
      "calcTotalReturn": 0.022425456852
    },
    {
      "date": "2022-01-30",
      "calcTotalReturn": 0.022425456852
    }, 
    {
      "date": "2022-02-28",
       "calcTotalReturn": -0.0424735070051586,
       "financeDataAttributes": [
       {
         "attributeId": "a-256",
         "value": "12.032791372796499"
       },
       {
       "attributeId": "a-257",
       "value": "9.975964795996589"
       },
       {
       "attributeId": "a-258",
       "value": "4.719852927810759"
       },
       {
       "attributeId": "a-259",
       "value": "4.18144793134823"
       },     
    ]
}

Code:

            items = []
            for i in response_finance['results']:
                items.append(i.get('financeCodeId'))
                for j in i['financeData']:
                    if 'financeDataAttributes' not in j:
                        items.append({j.get('date'), j.get('calcTotalReturn')})
                    else:
                        for k in j['financeDataAttributes']:
                            items.append({k.get('attributeId'), k.get('value')})
    df = pd.DataFrame.from_records(items)

    flat = pd.json_normalize(json.loads(df.to_json(orient="records")))

Output

out put

Expected:

Expected

CodePudding user response:

Since you already go through your json manually, you don't need json_normalize. You can create your own flattened record (I fixed the formatting of your json, check if it matches your real data):

response_finance = [{
  "financeCodeId": "fc-5599",
  "financeData": [
    {
      "date": "2022-01-30",
      "calcTotalReturn": 0.022425456852
    },
    {
      "date": "2022-01-30",
      "calcTotalReturn": 0.022425456852
    },
    {
      "date": "2022-01-30",
      "calcTotalReturn": 0.022425456852
    },
    {
      "date": "2022-01-30",
      "calcTotalReturn": 0.022425456852
    },
    {
      "date": "2022-01-30",
      "calcTotalReturn": 0.022425456852
    }, 
    {
      "date": "2022-02-28",
       "calcTotalReturn": -0.0424735070051586,
       "financeDataAttributes": [
       {
         "attributeId": "a-256",
         "value": "12.032791372796499"
       },
       {
       "attributeId": "a-257",
       "value": "9.975964795996589"
       },
       {
       "attributeId": "a-258",
       "value": "4.719852927810759"
       },
       {
       "attributeId": "a-259",
       "value": "4.18144793134823"
       },     
      ]
    }
  ]
}
]
items = []
for i in response_finance:
    for j in i['financeData']:
        if 'financeDataAttributes' not in j:
            item = {}
            item['financeCodeId'] = i.get('financeCodeId')
            item.update({'date': j.get('date'), 'calcTotalReturn': j.get('calcTotalReturn')})
            items.append(item)
        else:
            for k in j['financeDataAttributes']:
                item = {}
                item['financeCodeId'] = i.get('financeCodeId')
                item.update({'date': j.get('date'), 'calcTotalReturn': j.get('calcTotalReturn')})
                item.update({'attributeId': k.get('attributeId'), 'value': k.get('value')})
                items.append(item)

df = pd.DataFrame.from_records(items)

print(df)

Output:

  financeCodeId        date  calcTotalReturn attributeId               value
0       fc-5599  2022-01-30         0.022425         NaN                 NaN
1       fc-5599  2022-01-30         0.022425         NaN                 NaN
2       fc-5599  2022-01-30         0.022425         NaN                 NaN
3       fc-5599  2022-01-30         0.022425         NaN                 NaN
4       fc-5599  2022-01-30         0.022425         NaN                 NaN
5       fc-5599  2022-02-28        -0.042474       a-256  12.032791372796499
6       fc-5599  2022-02-28        -0.042474       a-257   9.975964795996589
7       fc-5599  2022-02-28        -0.042474       a-258   4.719852927810759
8       fc-5599  2022-02-28        -0.042474       a-259    4.18144793134823
  • Related