I am running into a problem with json_normalise, I can't wrap my head around it. The issue arises when the "record_path" part of the function has nested data (['extra', 'students']), I can't seem to get used nested data in the "meta" part of the function. Works fine when the record path is not nested. Any ideas?
json_list = [
{
'class': 'Year 1',
'student count': 20,
'room': 'Yellow',
'info': {
'teachers': {
'math': 'Rick Scott',
'physics': 'Elon Mask'
}
},
'extra': {
'students': [
{
'name': 'Tom',
'sex': 'M',
'grades': { 'math': 66, 'physics': 77 }
},
{
'name': 'James',
'sex': 'M',
'grades': { 'math': 80, 'physics': 78 }
},
]
}
},
{
'class': 'Year 2',
'student count': 25,
'room': 'Blue',
'info': {
'teachers': {
'math': 'Alan Turing',
'physics': 'Albert Einstein'
}
},
'extra': {
'students': [
{ 'name': 'Tony', 'sex': 'M' },
{ 'name': 'Jacqueline', 'sex': 'F' },
]
}
},
]
print(pd.json_normalize(
json_list,
record_path = ['extra', 'students'],
meta=['class', 'room', ['info', 'teachers', 'math']]
) )
CodePudding user response:
You can transform the json_list
before creating the dataframe or try to modify dataframe step-by-step by .apply(pd.Series)
/.explode
:
df = pd.DataFrame(json_list)
df = pd.concat(
[df, df.pop("info").apply(pd.Series), df.pop("extra").apply(pd.Series)],
axis=1,
).explode("students")
df = pd.concat(
[
df,
df.pop("teachers").apply(pd.Series).add_prefix("teachers_"),
df.pop("students").apply(pd.Series),
],
axis=1,
)
df = pd.concat(
[df, df.pop("grades").apply(pd.Series).add_prefix("grades_")],
axis=1,
).drop(columns="grades_0")
print(df)
Prints:
class student count room teachers_math teachers_physics name sex grades_math grades_physics
0 Year 1 20 Yellow Rick Scott Elon Mask Tom M 66.0 77.0
0 Year 1 20 Yellow Rick Scott Elon Mask James M 80.0 78.0
1 Year 2 25 Blue Alan Turing Albert Einstein Tony M NaN NaN
1 Year 2 25 Blue Alan Turing Albert Einstein Jacqueline F NaN NaN