I am trying to normalize/tabulate the multilevel data in JSON using Python Pandas.
Json data
{
"continent": "Asia",
"countries": [
{
"total" : "28",
"country": [
{
"name": "japan",
"economy": {
"business": "25%",
"jobs": "50%",
"government": "25%"
},
"population": [
{
"test1": "20L",
"test2": "15L"
}
]
},
{
"name": "china",
"economy": {
"business": "35%",
"jobs": "30%",
"government": "35%"
},
"population": [
{
"test1": "30L"
}
]
}
]
}
]
}
Required Output
Country_name | Economy_type | Percentage_economy |
---|---|---|
Japan | Business | 25% |
Japan | Jobs | 50% |
Japan | Government | 25% |
China | Business | 35% |
China | Jobs | 30% |
China | Government | 35% |
and
Country_name | Population type | Percentage |
---|---|---|
Japan | test1 | 20L |
Japan | test2 | 15L |
China | test1 | 30L |
What I have tried
I've tried using pandas json_normalizer
, flat_json
and tried below code:
import json
import pandas as pd
with open('test.json') as f:
d = json.load(f)
dataFrame = pd.DataFrame(columns=d[0].keys())
for i in range(len(d)):
dataFrame.loc[i] = d[i].values()
print(dataFrame)
However, I'm not getting the desired output. I'm seeing an error:
KeyError: 0
or
name | population | ... | economy.government | total | |
---|---|---|---|---|---|
0 | japan | [{'test1': '20L', 'test2': '15L'}] | ... | 25% | 28 |
1 | china | [{'test1': '30L'}] | ... | 35% | 28 |
CodePudding user response:
This is ugly but it works:
# di is the dictionary from the question
a = pd.json_normalize(di, record_path=["countries", ["country"]])
b = (pd.json_normalize(di['countries'], record_path=["country", ["population"]])
.merge(a[['name']], left_index=True, right_index=True)
.melt(id_vars='name')
.dropna()
.sort_values(by='name', ascending=False)
.set_axis(['Country_name', 'Population_type', 'Percentage'], axis=1)
)
a = (a
.melt(id_vars='name', value_vars= ['economy.business', 'economy.jobs', 'economy.government'])
.set_axis(['Country_name', 'Economy_type', 'Percentage_economy'], axis=1)
.sort_values(by='Country_name', ascending=False)
)
a.Economy_type = a.Economy_type.str[8:] # remove prefixes
Output a:
Country_name Economy_type Percentage_economy
0 japan business 25%
2 japan jobs 50%
4 japan government 25%
1 china business 35%
3 china jobs 30%
5 china government 35%
Output b:
Country_name Population_type Percentage
0 japan test1 20L
2 japan test2 15L
1 china test1 30L