Home > Software design >  How can I manipulate this nested Json data in Pandas?
How can I manipulate this nested Json data in Pandas?

Time:02-18

I have multiple nested JSON objects which are extracted from the REST API and look like this:

{
"count": 2,
"value": [{
    "teamMember": {
        "id": "ID1",
        "displayName": "NAME1",
        "uniqueName": "UNIQUE1",
        "url": "URL1",
        "imageUrl": "ImgURL1"
    },
    "activities": [{
        "capacityPerDay": 3.5,
        "name": ""
    }],
    "daysOff": [{
        "start": "2021-11-15T00:00:00Z",
        "end": "2021-11-15T00:00:00Z"
    }, {
        "start": "2021-11-26T00:00:00Z",
        "end": "2021-11-26T00:00:00Z"
    }],
    "url": "URL1"
}, {
    "teamMember": {
        "id": "ID2",
        "displayName": "NAME2",
        "uniqueName": "UNIQUE2",
        "url": "URL2",
        "imageUrl": "IMAGEURL2"
    },
    "activities": [{
        "capacityPerDay": 2.9,
        "name": ""
    }],
    "daysOff": [{
        "start": "2021-11-12T00:00:00Z",
        "end": "2021-11-12T00:00:00Z"
    }, {
        "start": "2021-12-02T00:00:00Z",
        "end": "2021-12-02T00:00:00Z"
    }],
    "url": "URL2"
}]}

I want to extract some information from this object to end up with something like this: here

How can I do that in pandas? any hints would be much appreciated!

CodePudding user response:

You have regular JSON with an array. There's no nested JSON. You can access the contents of the array in a regular way.

json_string = '''
...    <-- put your JSON here or get the JSON string in a different way
'''
import json
json_object = json.loads(json_string)
print(json_object["value"][0])
print(json_object["value"][1])

CodePudding user response:

If you need a flat table as a result

|---|----------------|-------|------|-----|-------|---------|------|-----------|
| 0 | capacityPerDay | start | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1   |
| 0 | capacityPerDay | end   | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1   |
| 0 | name           | start | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1   |
| 0 | name           | end   | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1   |
| 0 | capacityPerDay | start | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1   |
| 0 | capacityPerDay | end   | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1   |
| 0 | name           | start | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1   |
| 0 | name           | end   | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1   |
| 1 | capacityPerDay | nan   | nan  | ID2 | NAME2 | UNIQUE2 | URL2 | IMAGEURL2 |
| 1 | name           | nan   | nan  | ID2 | NAME2 | UNIQUE2 | URL2 | IMAGEURL2 |

You may just use pd.json_normalize() with few explode() like this:

import pandas as pd

#your_json = {smth}
pd.json_normalize(your_json['value']).explode('activities').explode('daysOff').explode('activities').explode('daysOff')

Of course, there are many ways to write this more clearly, but at least now you know which direction to look in.

  • Related