Pandas: Explode Nested JSON and Retain Row ID-CodePudding

How might I retain a row ID mapping after exploding a nested JSON?

Consider this example:

df = pd.DataFrame({'id': [1, 2], 'xd': [
    [
     {
        "status": "pass",
        "desc": "desc",
        "actionable": False,
        "err_code": "None",
        "err_msg": "None"
        },
      {
         "status": "pass",
         "desc": "desc",
         "actionable": False,
         "err_code": "err",
         "err_msg": "not found"
         }
     ],
    [
     {
        "status": "fail",
        "desc": "desc",
        "actionable": True,
        "err_code": "None",
        "err_msg": "None",
    },
   {
      "status": "pass",
      "desc": "desc",
      "actionable": True,
      "err_code": "err",
      "err_msg": "found"
      }
     ] ]})

# example df
    id  xd
0   1   [{'status': 'pass', 'desc': 'desc', 'actionabl...
1   2   [{'status': 'fail', 'desc': 'desc', 'actionabl...

Now explode it:

pd.json_normalize(df['xd'].explode())

    status  desc    actionable  err_code    err_msg
0   pass    desc    False       None        None
1   pass    desc    False       err         not found
2   fail    desc    True        None        None
3   pass    desc    True        err         found

Ok great, but now I want to retain a way that lets me link the first two rows as belonging to id 1 and the second two rows belonging two id 2 for an arbitrarily deep nested JSON xd.

CodePudding user response：

You could explode the column from df; create a DataFrame from "xd" then join it back:

df = df.explode('xd').reset_index(drop=True)
df = df.join(pd.DataFrame(df['xd'].tolist())).drop(columns='xd')

Output:

   id status  desc  actionable err_code    err_msg
0   1   pass  desc       False     None       None
1   1   pass  desc       False      err  not found
2   2   fail  desc        True     None       None
3   2   pass  desc        True      err      found

CodePudding user response：

Perhaps just explode the column, and then pipe it and call json_normalize and use the exploded index?

new_df = df['xd'].explode().pipe(lambda x: pd.json_normalize(x).set_index(x.index))

Output:

>>> new_df
  status  desc  actionable err_code    err_msg
0   pass  desc       False     None       None
0   pass  desc       False      err  not found
1   fail  desc        True     None       None
1   pass  desc        True      err      found