Home > Mobile >  python parse data from nested json
python parse data from nested json

Time:09-20

I have the following (excerpt) json data structure:

{
    "apiToken": {
        "createdAt": "2022-03-04T12:18:29.000956Z",
        "expiresAt": "2022-09-04T12:18:29.000956Z"
    },
    "canGenerateApiToken": true,
    "dateJoined": "2021-01-29T10:07:04.395172Z",
    "email": "[email protected]",
    "emailReadOnly": true,
    "emailVerified": true,
    "firstLogin": "2021-01-29T13:01:33.294216Z",
    "fullName": "John Doe",
    "fullNameReadOnly": true,
    "groupsReadOnly": false,
    "id": "32168415841",
    "isSystem": false,
    "lastLogin": "2022-09-12T08:51:00.159750Z",
    "lowestRole": "Admin",
    "primaryTwoFaMethod": "application",
    "scope": "account",
    "scopeRoles": [
        {
            "id": "68418945648943589",
            "name": "AT || ACME Inc.",
            "roleId": "9848949354653168",
            "roleName": "Admin",
            "roles": [
                "Admin"
            ]
        }
    ],
    "siteRoles": [],
    "source": "sso_saml",
    "tenantRoles": [],
    "twoFaEnabled": true
}

I'm trying to write certain data into an excel file with:

df = pd.json_normalize(result)
df.head()
df[['scope', 'fullName', 'email', 'lowestRole', 'scope',
    'scopeRoles.name']].to_excel(completename)

But I struggle with 'scopeRoles.name' as it's nested.

with the code above I get

raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index(['scope', 'fullName', 'email', 'lowestRole', 'scope', 'scopeRoles.name'], dtype='object')] are in the [columns]"

I also tried different versions, but always failed.

I basically need to understand how I can specify the fields to write into excel when the field itself is nested. If I just use "non-nested" entries it works perfectly fine

thanks

CodePudding user response:

You need to flatten your JSON data file. You could use the flatten_json package.

pip install flatten_json

from flatten_json import flatten

unflat_json = {'user':
           {'Rachel':
            {'UserID': 1717171717,
             'Email': '[email protected]',
             'friends': ['John', 'Jeremy', 'Emily']
             }
            }
           }

flat_json = flatten(unflat_json)

print(flat_json)

Output:

{‘user_Rachel_UserID’: 1717171717, ‘user_Rachel_Email’: ‘[email protected]’, ‘user_Rachel_friends_0’: ‘John’, ‘user_Rachel_friends_1’: ‘Jeremy’, ‘user_Rachel_friends_2’: ‘Emily’}

CodePudding user response:

To deal with a list of dictionaries, you can use df.from_records(). But, you need to process it separately to combine each dataframe together. I assumed the data used is exactly the same, considering the df['scopeRoles'] only consisted of one element. Please try something like this:

import pandas as pd

result = {
    "apiToken": {
        "createdAt": "2022-03-04T12:18:29.000956Z",
        "expiresAt": "2022-09-04T12:18:29.000956Z"
    },
    "canGenerateApiToken": True,
    "dateJoined": "2021-01-29T10:07:04.395172Z",
    "email": "[email protected]",
    "emailReadOnly": True,
    "emailVerified": True,
    "firstLogin": "2021-01-29T13:01:33.294216Z",
    "fullName": "John Doe",
    "fullNameReadOnly": True,
    "groupsReadOnly": False,
    "id": "32168415841",
    "isSystem": False,
    "lastLogin": "2022-09-12T08:51:00.159750Z",
    "lowestRole": "Admin",
    "primaryTwoFaMethod": "application",
    "scope": "account",
    "scopeRoles": [
        {
            "id": "68418945648943589",
            "name": "AT || ACME Inc.",
            "roleId": "9848949354653168",
            "roleName": "Admin",
            "roles": [
                "Admin"
            ]
        }
    ],
    "siteRoles": [],
    "source": "sso_saml",
    "tenantRoles": [],
    "twoFaEnabled": True
}

df = pd.json_normalize(result)
df2 = df[['scope', 'fullName', 'email', 'lowestRole', 'scope']]

# from_records() returns a dataframe from a list of dict df['scopeRoles'].
df3 = df.from_records(df["scopeRoles"][0])

# join df2 and df3
res = df2.join(df3)
print(res)

I hope this code helps!

EDIT

To get the name column only, you just have to subscript like so:

df3 = df.from_records(df["scopeRoles"][0])['name']
  • Related