I have a nested JSON as follows:
{
"group": {
"groupname": "grp1",
"groupid": 1,
"city": "London"
},
"persons": {
"0": {
"name": "john",
"age": 12,
"gender": "M",
"groupid": 1
},
"1": {
"name": "maat",
"age": 15,
"gender": "M",
"groupid": 1
},
"2": {
"name": "chrissle",
"age": 10,
"gender": "F",
"groupid": 1
},
"3": {
"name": "stacy",
"age": 11,
"gender": "F",
"groupid": 1
},
"4": {
"name": "mark",
"age": 12,
"gender": "M",
"groupid": 1
},
"5": {
"name": "job",
"age": 12,
"gender": "M",
"groupid": 1
}
},
"group": {
"groupname": "grp1",
"groupid": 2,
"city": "NewYork"
},
"persons": {
"0": {
"name": "will",
"age": 12,
"gender": "M",
"groupid": 2
},
"1": {
"name": "phil",
"age": 15,
"gender": "M",
"groupid": 2
},
"2": {
"name": "winnie",
"age": 10,
"gender": "F",
"groupid": 2
}
}
}
I want to separate the two sections group
and persons
into two df respectively.
For the second df persons
I want to include the section numbers as columns as follows:
id name age gender groupid
0 john 12 M 1
1 maat 15 M 1
2 chrissle 10 F 1
I have loaded the JSON as a list of dict and converted it into a df:
data= pd.DataFrame.from_dict(data)
I can then get persons
personsdf= personsdf['persons']
This will however give me a df with one column that has dict rows for every persons section.
I have tried below to unnest the dict rows:
finaldf= pd.DataFrame()
for index, row in personsdf.iterrows():
row_data=row['personsdf']
row_data = pd.DataFrame.from_dict(row_data)
row_data = row_data.T
finaldf= finaldf.append(row_data, ignore_index=True)
But then I get all the columns except the section number which gets lost. Is there a better way to approach this?
CodePudding user response:
If I understand you correctly you want to create two dataframes: one for groups and the second for persons:
data = [
{
"group": {"groupname": "grp1", "groupid": 1, "city": "London"},
"persons": {
"0": {"name": "john", "age": 12, "gender": "M", "groupid": 1},
"1": {"name": "maat", "age": 15, "gender": "M", "groupid": 1},
"2": {"name": "chrissle", "age": 10, "gender": "F", "groupid": 1},
"3": {"name": "stacy", "age": 11, "gender": "F", "groupid": 1},
"4": {"name": "mark", "age": 12, "gender": "M", "groupid": 1},
"5": {"name": "job", "age": 12, "gender": "M", "groupid": 1},
},
},
{
"group": {"groupname": "grp1", "groupid": 2, "city": "NewYork"},
"persons": {
"0": {"name": "will", "age": 12, "gender": "M", "groupid": 2},
"1": {"name": "phil", "age": 15, "gender": "M", "groupid": 2},
"2": {"name": "winnie", "age": 10, "gender": "F", "groupid": 2},
},
},
]
df1 = pd.DataFrame([d["group"] for d in data])
df2 = pd.DataFrame(
[{"id": k, **v} for d in data for k, v in d["persons"].items()]
)
print(df1)
print(df2)
Prints:
groupname groupid city
0 grp1 1 London
1 grp1 2 NewYork
id name age gender groupid
0 0 john 12 M 1
1 1 maat 15 M 1
2 2 chrissle 10 F 1
3 3 stacy 11 F 1
4 4 mark 12 M 1
5 5 job 12 M 1
6 0 will 12 M 2
7 1 phil 15 M 2
8 2 winnie 10 F 2