I have the following (excerpt) json data structure:
{
"apiToken": {
"createdAt": "2022-03-04T12:18:29.000956Z",
"expiresAt": "2022-09-04T12:18:29.000956Z"
},
"canGenerateApiToken": true,
"dateJoined": "2021-01-29T10:07:04.395172Z",
"email": "[email protected]",
"emailReadOnly": true,
"emailVerified": true,
"firstLogin": "2021-01-29T13:01:33.294216Z",
"fullName": "John Doe",
"fullNameReadOnly": true,
"groupsReadOnly": false,
"id": "32168415841",
"isSystem": false,
"lastLogin": "2022-09-12T08:51:00.159750Z",
"lowestRole": "Admin",
"primaryTwoFaMethod": "application",
"scope": "account",
"scopeRoles": [
{
"id": "68418945648943589",
"name": "AT || ACME Inc.",
"roleId": "9848949354653168",
"roleName": "Admin",
"roles": [
"Admin"
]
}
],
"siteRoles": [],
"source": "sso_saml",
"tenantRoles": [],
"twoFaEnabled": true
}
I'm trying to write certain data into an excel file with:
df = pd.json_normalize(result)
df.head()
df[['scope', 'fullName', 'email', 'lowestRole', 'scope',
'scopeRoles.name']].to_excel(completename)
But I struggle with 'scopeRoles.name' as it's nested.
with the code above I get
raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index(['scope', 'fullName', 'email', 'lowestRole', 'scope', 'scopeRoles.name'], dtype='object')] are in the [columns]"
I also tried different versions, but always failed.
I basically need to understand how I can specify the fields to write into excel when the field itself is nested. If I just use "non-nested" entries it works perfectly fine
thanks
CodePudding user response:
You need to flatten your JSON data file.
You could use the flatten_json
package.
pip install flatten_json
from flatten_json import flatten
unflat_json = {'user':
{'Rachel':
{'UserID': 1717171717,
'Email': '[email protected]',
'friends': ['John', 'Jeremy', 'Emily']
}
}
}
flat_json = flatten(unflat_json)
print(flat_json)
Output:
{‘user_Rachel_UserID’: 1717171717, ‘user_Rachel_Email’: ‘[email protected]’, ‘user_Rachel_friends_0’: ‘John’, ‘user_Rachel_friends_1’: ‘Jeremy’, ‘user_Rachel_friends_2’: ‘Emily’}
CodePudding user response:
To deal with a list of dictionaries, you can use df.from_records()
. But, you need to process it separately to combine each dataframe together. I assumed the data used is exactly the same, considering the df['scopeRoles']
only consisted of one element. Please try something like this:
import pandas as pd
result = {
"apiToken": {
"createdAt": "2022-03-04T12:18:29.000956Z",
"expiresAt": "2022-09-04T12:18:29.000956Z"
},
"canGenerateApiToken": True,
"dateJoined": "2021-01-29T10:07:04.395172Z",
"email": "[email protected]",
"emailReadOnly": True,
"emailVerified": True,
"firstLogin": "2021-01-29T13:01:33.294216Z",
"fullName": "John Doe",
"fullNameReadOnly": True,
"groupsReadOnly": False,
"id": "32168415841",
"isSystem": False,
"lastLogin": "2022-09-12T08:51:00.159750Z",
"lowestRole": "Admin",
"primaryTwoFaMethod": "application",
"scope": "account",
"scopeRoles": [
{
"id": "68418945648943589",
"name": "AT || ACME Inc.",
"roleId": "9848949354653168",
"roleName": "Admin",
"roles": [
"Admin"
]
}
],
"siteRoles": [],
"source": "sso_saml",
"tenantRoles": [],
"twoFaEnabled": True
}
df = pd.json_normalize(result)
df2 = df[['scope', 'fullName', 'email', 'lowestRole', 'scope']]
# from_records() returns a dataframe from a list of dict df['scopeRoles'].
df3 = df.from_records(df["scopeRoles"][0])
# join df2 and df3
res = df2.join(df3)
print(res)
I hope this code helps!
EDIT
To get the name
column only, you just have to subscript like so:
df3 = df.from_records(df["scopeRoles"][0])['name']