Home > Enterprise >  json_normalise issue when "record_path" variable uses nested data & "meta" varia
json_normalise issue when "record_path" variable uses nested data & "meta" varia

Time:09-09

I am running into a problem with json_normalise, I can't wrap my head around it. The issue arises when the "record_path" part of the function has nested data (['extra', 'students']), I can't seem to get used nested data in the "meta" part of the function. Works fine when the record path is not nested. Any ideas?

    json_list = [
    {
        'class': 'Year 1',
        'student count': 20,
        'room': 'Yellow',
        'info': {
            'teachers': {
                'math': 'Rick Scott',
                'physics': 'Elon Mask'
            }
        },
        'extra': {
            'students': [
                {
                    'name': 'Tom',
                    'sex': 'M',
                    'grades': { 'math': 66, 'physics': 77 }
                },
                {
                    'name': 'James',
                    'sex': 'M',
                    'grades': { 'math': 80, 'physics': 78 }
                },
            ]
        }
    },
    {
        'class': 'Year 2',
        'student count': 25,
        'room': 'Blue',
        'info': {
            'teachers': {
                'math': 'Alan Turing',
                'physics': 'Albert Einstein'
            }
        },
        'extra': {
            'students': [
                { 'name': 'Tony', 'sex': 'M' },
                { 'name': 'Jacqueline', 'sex': 'F' },
            ]
        }
    },
]

print(pd.json_normalize(
    json_list,
    record_path = ['extra', 'students'],
    meta=['class', 'room', ['info', 'teachers', 'math']]
) )

CodePudding user response:

You can transform the json_list before creating the dataframe or try to modify dataframe step-by-step by .apply(pd.Series)/.explode:

df = pd.DataFrame(json_list)
df = pd.concat(
    [df, df.pop("info").apply(pd.Series), df.pop("extra").apply(pd.Series)],
    axis=1,
).explode("students")
df = pd.concat(
    [
        df,
        df.pop("teachers").apply(pd.Series).add_prefix("teachers_"),
        df.pop("students").apply(pd.Series),
    ],
    axis=1,
)
df = pd.concat(
    [df, df.pop("grades").apply(pd.Series).add_prefix("grades_")],
    axis=1,
).drop(columns="grades_0")

print(df)

Prints:

    class  student count    room teachers_math teachers_physics        name sex  grades_math  grades_physics
0  Year 1             20  Yellow    Rick Scott        Elon Mask         Tom   M         66.0            77.0
0  Year 1             20  Yellow    Rick Scott        Elon Mask       James   M         80.0            78.0
1  Year 2             25    Blue   Alan Turing  Albert Einstein        Tony   M          NaN             NaN
1  Year 2             25    Blue   Alan Turing  Albert Einstein  Jacqueline   F          NaN             NaN
  • Related