I am trying to figure out how to normalize the nested JSON response sampled below.
Right now, json_normalize(res,record_path=['data'])
is giving me MOST of the data I need but what I would really like is the detail in the "session_pageviews" list/dict with the attributes of the data list/dic included.
I tried json_normalize(res,record_path=['data', ['session_pageviews']], meta = ['data'])
but I get an error: ValueError: operands could not be broadcast together with shape (32400,) (180,)
I also tried json_normalize(res,record_path=['data'], max_level = 1)
but that does not unnest session_pageviews
Any help would be appreciated!
CodePudding user response:
You can try to apply the following function to your json:
def flatten_nested_json_df(df):
df = df.reset_index()
s = (df.applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df.applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
while len(list_columns) > 0 or len(dict_columns) > 0:
new_columns = []
for col in dict_columns:
horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
horiz_exploded.index = df.index
df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
new_columns.extend(horiz_exploded.columns) # inplace
for col in list_columns:
#print(f"exploding: {col}")
df = df.drop(columns=[col]).join(df[col].explode().to_frame())
new_columns.append(col)
s = (df[new_columns].applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df[new_columns].applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
return df
by doing this:
df1= flatten_nested_json_df(df)
where
df = pd.json_normalize(json)
That should give you all the information contained in your json.