Home > Back-end >  Convert a MultiIndex pandas DataFrame to a nested JSON
Convert a MultiIndex pandas DataFrame to a nested JSON

Time:02-27

I have the following Dataframe with MultiIndex rows in pandas.

               time    available_slots       status
month day                                          
1     1    10:00:00                  1    AVAILABLE
      1    12:00:00                  1    AVAILABLE
      1    14:00:00                  1    AVAILABLE
      1    16:00:00                  1    AVAILABLE
      1    18:00:00                  1    AVAILABLE
      2    10:00:00                  1    AVAILABLE
...             ...                ...          ...
2     28   12:00:00                  1    AVAILABLE
      28   14:00:00                  1    AVAILABLE
      28   16:00:00                  1    AVAILABLE
      28   18:00:00                  1    AVAILABLE
      28   20:00:00                  1    AVAILABLE

And I need to transform it to a hierarchical nested JSON as this:

[
    {
        "month": 1,
        "days": [
            {
                "day": 1,
                "slots": [
                    {
                        "time": "10:00:00",
                        "available_slots": 1,
                        "status": "AVAILABLE"
                    },
                    {
                        "time": "12:00:00",
                        "available_slots": 1,
                        "status": "AVAILABLE"
                    },
                    ...
                ]
            },
            {
                "day": 2,
                "slots": [
                    ...
                ]
            }
        ]
    },
    {
        "month": 2,
        "days":[
            {
                "day": 1,
                "slots": [
                    ...
                ]
            }
        ]
    },
    ...
]

Unfortunately, it is not as easy as doing df.to_json(orient="index").

Does anyone know if there is a method in pandas to perform this kind of transformations? or in what way I could iterate over the DataFrame to build the final object?

CodePudding user response:

Here's one way. Basically repeated groupby apply(to_dict) reset_index until we get the desired shape:

out = (df.groupby(level=[0,1])
       .apply(lambda x: x.to_dict('records'))
       .reset_index()
       .rename(columns={0:'slots'})
       .groupby('month')
       .apply(lambda x: x[['day','slots']].to_dict('records'))
       .reset_index()
       .rename(columns={0:'days'})
       .to_json(orient='records', indent=True)
      )

Output:

[
 {
  "month":1,
  "days":[
   {
    "day":1,
    "slots":[
     {
      "time":"10:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"12:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"14:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"16:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"18:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     }
    ]
   },
   {
    "day":2,
    "slots":[
     {
      "time":"10:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     }
    ]
   }
  ]
 },
 {
  "month":2,
  "days":[
   {
    "day":28,
    "slots":[
     {
      "time":"12:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"14:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"16:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"18:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"20:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     }
    ]
   }
  ]
 }
]

CodePudding user response:

You can use a double loop for each level of your index:

data = []
for month, df1 in df.groupby(level=0):
    data.append({'month': month, 'days': []})
    for day, df2 in df1.groupby(level=1):
        data[-1]['days'].append({'day': day, 'slots': df2.to_dict('records')})

Output:

import json
print(json.dumps(data, indent=2))

[
  {
    "month": 1,
    "days": [
      {
        "day": 1,
        "slots": [
          {
            "time": "10:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "12:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "14:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "16:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "18:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          }
        ]
      },
      {
        "day": 2,
        "slots": [
          {
            "time": "10:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          }
        ]
      }
    ]
  },
  {
    "month": 2,
    "days": [
      {
        "day": 28,
        "slots": [
          {
            "time": "12:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "14:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "18:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "20:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          }
        ]
      }
    ]
  }
]
  • Related