Home > database >  How to convert a column with Json to a dataframe column in Pandas?
How to convert a column with Json to a dataframe column in Pandas?

Time:02-22

I have this DataFrame with one field that is a json:

       id               lastActionDate                                            clients
    0  26   2021-07-02T13:59:35.273662  [{'id': '123', 'personType': 1, 'profileType', 'businessName': 'Michael Corleone':...
    1  30  2021-07-24T15:44:38.2403574  [{'id': '456', 'personType': 1, 'profileType', 'businessName': 'Vito Corleone', :...

I want tranform this json to columns, like expected result below:

       id               lastActionDate   id  personType  profileType      businessName
    0  26   2021-07-02T13:59:35.273662   123           1            2  Michael Corleone
    1  30  2021-07-24T15:44:38.2403574   456           1            2     Vito Corleone

How can I do this code?

CodePudding user response:

What you can do is define the following function (which is applicable to any dataframe):

def flatten_nested_json_df(df):
    df = df.reset_index()
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()
    
    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    
    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns) # inplace

        for col in list_columns:
            print(f"exploding: {col}")
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df

and do:

flatten_nested_json_df(your_df)

where your_df is the dataframe you posted.

Example:

df_f = {
    "school_name": "ABC primary school",
    "class": "Year 1",
    "students": [
    {
        "id": "A001",
        "name": "Tom",
        "math": 60,
        "physics": 66,
        "chemistry": 61
    },
    {
        "id": "A002",
        "name": "James",
        "math": 89,
        "physics": 76,
        "chemistry": 51
    },
    {
        "id": "A003",
        "name": "Jenny",
        "math": 79,
        "physics": 90,
        "chemistry": 78
    }]
}

and

df = pd.json_normalize(df_f)

which is

 school_name      class   \
0  ABC primary school  Year 1   

                       students                       
0  [{'id': 'A001', 'name': 'Tom', 'math': 60, 'ph...

and applying the function

flatten_nested_json_df(df)

will return:

index     school_name      class  students.id students.name  students.math  \
0    0    ABC primary school  Year 1     A001           Tom          60         
0    0    ABC primary school  Year 1     A002         James          89         
0    0    ABC primary school  Year 1     A003         Jenny          79         

   students.physics  students.chemistry  
0         66                 61          
0         76                 51          
0         90                 78          
​

CodePudding user response:

You can do it like this:

 df_nested_list = pd.json_normalize(data, record_path =['clients'])
  • Related