Home > Net >  Multilevel uneven Json to table using Python
Multilevel uneven Json to table using Python

Time:09-21

I am trying to normalize/tabulate the multilevel data in JSON using Python Pandas.

Json data

{
  "continent": "Asia",
  "countries": [
    {
      "total" : "28",
      "country": [
        {
          "name": "japan",
          "economy": {
            "business": "25%",
            "jobs": "50%",
            "government": "25%"
          },
          "population": [
            {
              "test1": "20L",
              "test2": "15L"
            }
          ]
        },
        {
           "name": "china",
          "economy": {
            "business": "35%",
            "jobs": "30%",
            "government": "35%"
          },
          "population": [
            {
              "test1": "30L"
            }
          ]
        }
      ]
    }
  ]
}

Required Output

Country_name Economy_type Percentage_economy
Japan Business 25%
Japan Jobs 50%
Japan Government 25%
China Business 35%
China Jobs 30%
China Government 35%

and

Country_name Population type Percentage
Japan test1 20L
Japan test2 15L
China test1 30L

What I have tried

I've tried using pandas json_normalizer , flat_json and tried below code:

import json
import pandas as pd

with open('test.json') as f:
    d = json.load(f)
dataFrame = pd.DataFrame(columns=d[0].keys())
for i in range(len(d)):
    dataFrame.loc[i] = d[i].values()
    print(dataFrame)

However, I'm not getting the desired output. I'm seeing an error:

KeyError: 0

or

name population ... economy.government total
0 japan [{'test1': '20L', 'test2': '15L'}] ... 25% 28
1 china [{'test1': '30L'}] ... 35% 28

CodePudding user response:

This is ugly but it works:

# di is the dictionary from the question

a = pd.json_normalize(di, record_path=["countries", ["country"]])

b = (pd.json_normalize(di['countries'], record_path=["country", ["population"]])
     .merge(a[['name']], left_index=True, right_index=True)
     .melt(id_vars='name')
     .dropna()
     .sort_values(by='name', ascending=False)
     .set_axis(['Country_name', 'Population_type', 'Percentage'], axis=1)
    )

a = (a
     .melt(id_vars='name', value_vars= ['economy.business', 'economy.jobs', 'economy.government'])
     .set_axis(['Country_name', 'Economy_type', 'Percentage_economy'], axis=1)
     .sort_values(by='Country_name', ascending=False)
    )

a.Economy_type = a.Economy_type.str[8:]  # remove prefixes

Output a:

    Country_name    Economy_type    Percentage_economy
0   japan            business        25%
2   japan            jobs            50%
4   japan            government      25%
1   china            business        35%
3   china            jobs            30%
5   china            government      35%

Output b:

    Country_name    Population_type     Percentage
0   japan           test1               20L
2   japan           test2               15L
1   china           test1               30L
  • Related