I have multiple nested JSON objects which are extracted from the REST API and look like this:
{
"count": 2,
"value": [{
"teamMember": {
"id": "ID1",
"displayName": "NAME1",
"uniqueName": "UNIQUE1",
"url": "URL1",
"imageUrl": "ImgURL1"
},
"activities": [{
"capacityPerDay": 3.5,
"name": ""
}],
"daysOff": [{
"start": "2021-11-15T00:00:00Z",
"end": "2021-11-15T00:00:00Z"
}, {
"start": "2021-11-26T00:00:00Z",
"end": "2021-11-26T00:00:00Z"
}],
"url": "URL1"
}, {
"teamMember": {
"id": "ID2",
"displayName": "NAME2",
"uniqueName": "UNIQUE2",
"url": "URL2",
"imageUrl": "IMAGEURL2"
},
"activities": [{
"capacityPerDay": 2.9,
"name": ""
}],
"daysOff": [{
"start": "2021-11-12T00:00:00Z",
"end": "2021-11-12T00:00:00Z"
}, {
"start": "2021-12-02T00:00:00Z",
"end": "2021-12-02T00:00:00Z"
}],
"url": "URL2"
}]}
I want to extract some information from this object to end up with something like this: here
How can I do that in pandas? any hints would be much appreciated!
CodePudding user response:
You have regular JSON with an array. There's no nested JSON. You can access the contents of the array in a regular way.
json_string = '''
... <-- put your JSON here or get the JSON string in a different way
'''
import json
json_object = json.loads(json_string)
print(json_object["value"][0])
print(json_object["value"][1])
CodePudding user response:
If you need a flat table as a result
|---|----------------|-------|------|-----|-------|---------|------|-----------|
| 0 | capacityPerDay | start | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1 |
| 0 | capacityPerDay | end | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1 |
| 0 | name | start | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1 |
| 0 | name | end | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1 |
| 0 | capacityPerDay | start | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1 |
| 0 | capacityPerDay | end | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1 |
| 0 | name | start | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1 |
| 0 | name | end | URL1 | ID1 | NAME1 | UNIQUE1 | URL1 | ImgURL1 |
| 1 | capacityPerDay | nan | nan | ID2 | NAME2 | UNIQUE2 | URL2 | IMAGEURL2 |
| 1 | name | nan | nan | ID2 | NAME2 | UNIQUE2 | URL2 | IMAGEURL2 |
You may just use pd.json_normalize()
with few explode()
like this:
import pandas as pd
#your_json = {smth}
pd.json_normalize(your_json['value']).explode('activities').explode('daysOff').explode('activities').explode('daysOff')
Of course, there are many ways to write this more clearly, but at least now you know which direction to look in.