Home > Blockchain >  Python Pandas json_normalize with multiple lists of dicts
Python Pandas json_normalize with multiple lists of dicts

Time:08-19

I'm trying to flatten a JSON file that was originally converted from XML using xmltodict(). There are multiple fields that may have a list of dictionaries. I've tried using record_path with meta data to no avail, but I have not been able to get it to work when there are multiple fields that may have other nested fields. It's expected that some fields will be empty for any given record

I have tried searching for another topic and couldn't find my specific problem with multiple nested fields. Can anyone point me in the right direction?

Thanks for any help that can be provided!

Sample base Python (without the record path)

import pandas as pd
import json

with open('./example.json', encoding="UTF-8") as json_file:
    json_dict = json.load(json_file)

df = pd.json_normalize(json_dict['WIDGET'])
print(df)

df.to_csv('./test.csv', index=False)

Sample JSON

{
    "WIDGET": [
        {
            "ID": "6",
            "PROBLEM": "Electrical",
            "SEVERITY_LEVEL": "1",
            "TITLE": "Battery's Missing",
            "CATEGORY": "User Error",
            "LAST_SERVICE": "2020-01-04T17:39:37Z",
            "NOTICE_DATE": "2022-01-01T08:00:00Z",
            "FIXABLE": "1",
            "COMPONENTS": {
                "WHATNOTS": {
                    "WHATNOT1": "Battery Compartment",
                    "WHATNOT2": "Whirlygig"
                }
            },
            "DIAGNOSIS": "Customer needs to put batteries in the battery compartment",
            "STATUS": "0",
            "CONTACT_TYPE": {
                "CALL": "1"
            }
        },
        {
            "ID": "1004",
            "PROBLEM": "Electrical",
            "SEVERITY_LEVEL": "4",
            "TITLE": "Flames emit from unit",
            "CATEGORY": "Dangerous",
            "LAST_SERVICE": "2015-06-04T21:40:12Z",
            "NOTICE_DATE": "2022-01-01T08:00:00Z",
            "FIXABLE": "0",
            "DIAGNOSIS": "A demon seems to have possessed the unit and his expelling flames from it",
            "CONSEQUENCE": "Could burn things",
            "SOLUTION": "Call an exorcist",
            "KNOWN_PROBLEMS": {
                "PROBLEM": [
                    {
                        "TYPE": "RECALL",
                        "NAME": "Bad Servo",
                        "DESCRIPTION": "Bad servo's shipped in initial product"
                    },
                    {
                        "TYPE": "FAILURE",
                        "NAME": "Operating outside normal conditions",
                        "DESCRIPTION": "Device failed when customer threw into wood chipper"
                    }
                ]
            },
            "STATUS": "1",
            "REPAIR_BULLETINS": {
                "BULLETIN": [
                    {
                        "@id": "4",
                        "#text": "Known target of the occult"
                    },
                    {
                        "@id": "5",
                        "#text": "Not meant to be thrown into wood chippers"
                    }
                ]
            },
            "CONTACT_TYPE": {
                "CALL": "1"
            }
        }        
    ]
}

Sample CSV

ID PROBLEM SEVERITY_LEVEL TITLE CATEGORY LAST_SERVICE NOTICE_DATE FIXABLE DIAGNOSIS STATUS COMPONENTS.WHATNOTS.WHATNOT1 COMPONENTS.WHATNOTS.WHATNOT2 CONTACT_TYPE.CALL CONSEQUENCE SOLUTION KNOWN_PROBLEMS.PROBLEM REPAIR_BULLETINS.BULLETIN
6 Electrical 1 Battery's Missing User Error 2020-01-04T17:39:37Z 2022-01-01T08:00:00Z 1 Customer needs to put batteries in the battery compartment 0 Battery Compartment Whirlygig 1
1004 Electrical 4 Flames emit from unit Dangerous 2015-06-04T21:40:12Z 2022-01-01T08:00:00Z 0 A demon seems to have possessed the unit and his expelling flames from it 1 1 Could burn things Call an exorcist [{'TYPE': 'RECALL', 'NAME': 'Bad Servo', 'DESCRIPTION': "Bad servo's shipped in initial product"}, {'TYPE': 'FAILURE', 'NAME': 'Operating outside normal conditions', 'DESCRIPTION': 'Device failed when customer threw into wood chipper'}] [{'@id': '4', '#text': 'Known target of the occult'}, {'@id': '5', '#text': 'Not meant to be thrown into wood chippers'}]

CodePudding user response:

I have attempted to extract the data and turned it into nested dictionary (instead of nested with list), so that pd.json_normalize() can work

for row in range(len(json_dict['WIDGET'])):
    try:
        lis = json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
        del json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
        for i, item in enumerate(lis):
            json_dict['WIDGET'][row]['KNOWN_PROBLEMS'][str(i)] = item
            
        lis = json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
        del json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
        for i, item in enumerate(lis):
            json_dict['WIDGET'][row]['REPAIR_BULLETINS'][str(i)] = item
            
    except KeyError:
        continue

df = pd.json_normalize(json_dict['WIDGET']).T
print(df)

Output:

                                                          0                              1
ID                                                        6                           1004
PROBLEM                                          Electrical                     Electrical
SEVERITY_LEVEL                                            1                              4
TITLE                                     Battery's Missing          Flames emit from unit
CATEGORY                                         User Error                      Dangerous
LAST_SERVICE                           2020-01-04T17:39:37Z           2015-06-04T21:40:12Z
NOTICE_DATE                            2022-01-01T08:00:00Z           2022-01-01T08:00:00Z
FIXABLE                                                   1                              0
DIAGNOSIS                     Customer needs to put batt...  A demon seems to have poss...
STATUS                                                    0                              1
COMPONENTS.WHATNOTS.WHATNOT1            Battery Compartment                            NaN
COMPONENTS.WHATNOTS.WHATNOT2                      Whirlygig                            NaN
CONTACT_TYPE.CALL                                         1                              1
CONSEQUENCE                                             NaN              Could burn things
SOLUTION                                                NaN               Call an exorcist
KNOWN_PROBLEMS.0.TYPE                                   NaN                         RECALL
KNOWN_PROBLEMS.0.NAME                                   NaN                      Bad Servo
KNOWN_PROBLEMS.0.DESCRIPTION                            NaN  Bad servo's shipped in ini...
KNOWN_PROBLEMS.1.TYPE                                   NaN                        FAILURE
KNOWN_PROBLEMS.1.NAME                                   NaN  Operating outside normal c...
KNOWN_PROBLEMS.1.DESCRIPTION                            NaN  Device failed when custome...
REPAIR_BULLETINS.0.@id                                  NaN                              4
REPAIR_BULLETINS.0.#text                                NaN     Known target of the occult
REPAIR_BULLETINS.1.@id                                  NaN                              5
REPAIR_BULLETINS.1.#text                                NaN  Not meant to be thrown int...
  • Related