Multilevel uneven Json to table using Python-CodePudding

I am trying to normalize/tabulate the multilevel data in JSON using Python Pandas.

Json data

{
  "continent": "Asia",
  "countries": [
    {
      "total" : "28",
      "country": [
        {
          "name": "japan",
          "economy": {
            "business": "25%",
            "jobs": "50%",
            "government": "25%"
          },
          "population": [
            {
              "test1": "20L",
              "test2": "15L"
            }
          ]
        },
        {
           "name": "china",
          "economy": {
            "business": "35%",
            "jobs": "30%",
            "government": "35%"
          },
          "population": [
            {
              "test1": "30L"
            }
          ]
        }
      ]
    }
  ]
}

Required Output

Country_name	Economy_type	Percentage_economy
Japan	Business	25%
Japan	Jobs	50%
Japan	Government	25%
China	Business	35%
China	Jobs	30%
China	Government	35%

and

Country_name	Population type	Percentage
Japan	test1	20L
Japan	test2	15L
China	test1	30L

What I have tried

I've tried using pandas json_normalizer , flat_json and tried below code:

import json
import pandas as pd

with open('test.json') as f:
    d = json.load(f)
dataFrame = pd.DataFrame(columns=d[0].keys())
for i in range(len(d)):
    dataFrame.loc[i] = d[i].values()
    print(dataFrame)

However, I'm not getting the desired output. I'm seeing an error:

KeyError: 0

	name	population	...	economy.government	total
0	japan	[{'test1': '20L', 'test2': '15L'}]	...	25%	28
1	china	[{'test1': '30L'}]	...	35%	28

CodePudding user response：

This is ugly but it works:

# di is the dictionary from the question

a = pd.json_normalize(di, record_path=["countries", ["country"]])

b = (pd.json_normalize(di['countries'], record_path=["country", ["population"]])
     .merge(a[['name']], left_index=True, right_index=True)
     .melt(id_vars='name')
     .dropna()
     .sort_values(by='name', ascending=False)
     .set_axis(['Country_name', 'Population_type', 'Percentage'], axis=1)
    )

a = (a
     .melt(id_vars='name', value_vars= ['economy.business', 'economy.jobs', 'economy.government'])
     .set_axis(['Country_name', 'Economy_type', 'Percentage_economy'], axis=1)
     .sort_values(by='Country_name', ascending=False)
    )

a.Economy_type = a.Economy_type.str[8:]  # remove prefixes

Output a:

    Country_name    Economy_type    Percentage_economy
0   japan            business        25%
2   japan            jobs            50%
4   japan            government      25%
1   china            business        35%
3   china            jobs            30%
5   china            government      35%

Output b:

    Country_name    Population_type     Percentage
0   japan           test1               20L
2   japan           test2               15L
1   china           test1               30L