I have this DataFrame with one field that is a json:
id lastActionDate clients
0 26 2021-07-02T13:59:35.273662 [{'id': '123', 'personType': 1, 'profileType', 'businessName': 'Michael Corleone':...
1 30 2021-07-24T15:44:38.2403574 [{'id': '456', 'personType': 1, 'profileType', 'businessName': 'Vito Corleone', :...
I want tranform this json to columns, like expected result below:
id lastActionDate id personType profileType businessName
0 26 2021-07-02T13:59:35.273662 123 1 2 Michael Corleone
1 30 2021-07-24T15:44:38.2403574 456 1 2 Vito Corleone
How can I do this code?
CodePudding user response:
What you can do is define the following function (which is applicable to any dataframe):
def flatten_nested_json_df(df):
df = df.reset_index()
s = (df.applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df.applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
while len(list_columns) > 0 or len(dict_columns) > 0:
new_columns = []
for col in dict_columns:
horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
horiz_exploded.index = df.index
df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
new_columns.extend(horiz_exploded.columns) # inplace
for col in list_columns:
print(f"exploding: {col}")
df = df.drop(columns=[col]).join(df[col].explode().to_frame())
new_columns.append(col)
s = (df[new_columns].applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df[new_columns].applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
return df
and do:
flatten_nested_json_df(your_df)
where your_df
is the dataframe you posted.
Example:
df_f = {
"school_name": "ABC primary school",
"class": "Year 1",
"students": [
{
"id": "A001",
"name": "Tom",
"math": 60,
"physics": 66,
"chemistry": 61
},
{
"id": "A002",
"name": "James",
"math": 89,
"physics": 76,
"chemistry": 51
},
{
"id": "A003",
"name": "Jenny",
"math": 79,
"physics": 90,
"chemistry": 78
}]
}
and
df = pd.json_normalize(df_f)
which is
school_name class \
0 ABC primary school Year 1
students
0 [{'id': 'A001', 'name': 'Tom', 'math': 60, 'ph...
and applying the function
flatten_nested_json_df(df)
will return:
index school_name class students.id students.name students.math \
0 0 ABC primary school Year 1 A001 Tom 60
0 0 ABC primary school Year 1 A002 James 89
0 0 ABC primary school Year 1 A003 Jenny 79
students.physics students.chemistry
0 66 61
0 76 51
0 90 78
CodePudding user response:
You can do it like this:
df_nested_list = pd.json_normalize(data, record_path =['clients'])