How might I retain a row ID mapping after exploding a nested JSON?
Consider this example:
df = pd.DataFrame({'id': [1, 2], 'xd': [
[
{
"status": "pass",
"desc": "desc",
"actionable": False,
"err_code": "None",
"err_msg": "None"
},
{
"status": "pass",
"desc": "desc",
"actionable": False,
"err_code": "err",
"err_msg": "not found"
}
],
[
{
"status": "fail",
"desc": "desc",
"actionable": True,
"err_code": "None",
"err_msg": "None",
},
{
"status": "pass",
"desc": "desc",
"actionable": True,
"err_code": "err",
"err_msg": "found"
}
] ]})
# example df
id xd
0 1 [{'status': 'pass', 'desc': 'desc', 'actionabl...
1 2 [{'status': 'fail', 'desc': 'desc', 'actionabl...
Now explode it:
pd.json_normalize(df['xd'].explode())
status desc actionable err_code err_msg
0 pass desc False None None
1 pass desc False err not found
2 fail desc True None None
3 pass desc True err found
Ok great, but now I want to retain a way that lets me link the first two rows as belonging to id 1
and the second two rows belonging two id 2
for an arbitrarily deep nested JSON xd
.
CodePudding user response:
You could explode
the column from df
; create a DataFrame from "xd" then join
it back:
df = df.explode('xd').reset_index(drop=True)
df = df.join(pd.DataFrame(df['xd'].tolist())).drop(columns='xd')
Output:
id status desc actionable err_code err_msg
0 1 pass desc False None None
1 1 pass desc False err not found
2 2 fail desc True None None
3 2 pass desc True err found
CodePudding user response:
Perhaps just explode
the column, and then pipe
it and call json_normalize
and use the exploded index?
new_df = df['xd'].explode().pipe(lambda x: pd.json_normalize(x).set_index(x.index))
Output:
>>> new_df
status desc actionable err_code err_msg
0 pass desc False None None
0 pass desc False err not found
1 fail desc True None None
1 pass desc True err found