I'm trying to flatten a JSON file that was originally converted from XML using xmltodict()
. There are multiple fields that may have a list of dictionaries. I've tried using record_path with meta data to no avail, but I have not been able to get it to work when there are multiple fields that may have other nested fields. It's expected that some fields will be empty for any given record
I have tried searching for another topic and couldn't find my specific problem with multiple nested fields. Can anyone point me in the right direction?
Thanks for any help that can be provided!
Sample base Python (without the record path)
import pandas as pd
import json
with open('./example.json', encoding="UTF-8") as json_file:
json_dict = json.load(json_file)
df = pd.json_normalize(json_dict['WIDGET'])
print(df)
df.to_csv('./test.csv', index=False)
Sample JSON
{
"WIDGET": [
{
"ID": "6",
"PROBLEM": "Electrical",
"SEVERITY_LEVEL": "1",
"TITLE": "Battery's Missing",
"CATEGORY": "User Error",
"LAST_SERVICE": "2020-01-04T17:39:37Z",
"NOTICE_DATE": "2022-01-01T08:00:00Z",
"FIXABLE": "1",
"COMPONENTS": {
"WHATNOTS": {
"WHATNOT1": "Battery Compartment",
"WHATNOT2": "Whirlygig"
}
},
"DIAGNOSIS": "Customer needs to put batteries in the battery compartment",
"STATUS": "0",
"CONTACT_TYPE": {
"CALL": "1"
}
},
{
"ID": "1004",
"PROBLEM": "Electrical",
"SEVERITY_LEVEL": "4",
"TITLE": "Flames emit from unit",
"CATEGORY": "Dangerous",
"LAST_SERVICE": "2015-06-04T21:40:12Z",
"NOTICE_DATE": "2022-01-01T08:00:00Z",
"FIXABLE": "0",
"DIAGNOSIS": "A demon seems to have possessed the unit and his expelling flames from it",
"CONSEQUENCE": "Could burn things",
"SOLUTION": "Call an exorcist",
"KNOWN_PROBLEMS": {
"PROBLEM": [
{
"TYPE": "RECALL",
"NAME": "Bad Servo",
"DESCRIPTION": "Bad servo's shipped in initial product"
},
{
"TYPE": "FAILURE",
"NAME": "Operating outside normal conditions",
"DESCRIPTION": "Device failed when customer threw into wood chipper"
}
]
},
"STATUS": "1",
"REPAIR_BULLETINS": {
"BULLETIN": [
{
"@id": "4",
"#text": "Known target of the occult"
},
{
"@id": "5",
"#text": "Not meant to be thrown into wood chippers"
}
]
},
"CONTACT_TYPE": {
"CALL": "1"
}
}
]
}
Sample CSV
ID | PROBLEM | SEVERITY_LEVEL | TITLE | CATEGORY | LAST_SERVICE | NOTICE_DATE | FIXABLE | DIAGNOSIS | STATUS | COMPONENTS.WHATNOTS.WHATNOT1 | COMPONENTS.WHATNOTS.WHATNOT2 | CONTACT_TYPE.CALL | CONSEQUENCE | SOLUTION | KNOWN_PROBLEMS.PROBLEM | REPAIR_BULLETINS.BULLETIN |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | Electrical | 1 | Battery's Missing | User Error | 2020-01-04T17:39:37Z | 2022-01-01T08:00:00Z | 1 | Customer needs to put batteries in the battery compartment | 0 | Battery Compartment | Whirlygig | 1 | ||||
1004 | Electrical | 4 | Flames emit from unit | Dangerous | 2015-06-04T21:40:12Z | 2022-01-01T08:00:00Z | 0 | A demon seems to have possessed the unit and his expelling flames from it | 1 | 1 | Could burn things | Call an exorcist | [{'TYPE': 'RECALL', 'NAME': 'Bad Servo', 'DESCRIPTION': "Bad servo's shipped in initial product"}, {'TYPE': 'FAILURE', 'NAME': 'Operating outside normal conditions', 'DESCRIPTION': 'Device failed when customer threw into wood chipper'}] | [{'@id': '4', '#text': 'Known target of the occult'}, {'@id': '5', '#text': 'Not meant to be thrown into wood chippers'}] |
CodePudding user response:
I have attempted to extract the data and turned it into nested dictionary (instead of nested with list), so that pd.json_normalize()
can work
for row in range(len(json_dict['WIDGET'])):
try:
lis = json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
del json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
for i, item in enumerate(lis):
json_dict['WIDGET'][row]['KNOWN_PROBLEMS'][str(i)] = item
lis = json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
del json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
for i, item in enumerate(lis):
json_dict['WIDGET'][row]['REPAIR_BULLETINS'][str(i)] = item
except KeyError:
continue
df = pd.json_normalize(json_dict['WIDGET']).T
print(df)
Output:
0 1
ID 6 1004
PROBLEM Electrical Electrical
SEVERITY_LEVEL 1 4
TITLE Battery's Missing Flames emit from unit
CATEGORY User Error Dangerous
LAST_SERVICE 2020-01-04T17:39:37Z 2015-06-04T21:40:12Z
NOTICE_DATE 2022-01-01T08:00:00Z 2022-01-01T08:00:00Z
FIXABLE 1 0
DIAGNOSIS Customer needs to put batt... A demon seems to have poss...
STATUS 0 1
COMPONENTS.WHATNOTS.WHATNOT1 Battery Compartment NaN
COMPONENTS.WHATNOTS.WHATNOT2 Whirlygig NaN
CONTACT_TYPE.CALL 1 1
CONSEQUENCE NaN Could burn things
SOLUTION NaN Call an exorcist
KNOWN_PROBLEMS.0.TYPE NaN RECALL
KNOWN_PROBLEMS.0.NAME NaN Bad Servo
KNOWN_PROBLEMS.0.DESCRIPTION NaN Bad servo's shipped in ini...
KNOWN_PROBLEMS.1.TYPE NaN FAILURE
KNOWN_PROBLEMS.1.NAME NaN Operating outside normal c...
KNOWN_PROBLEMS.1.DESCRIPTION NaN Device failed when custome...
REPAIR_BULLETINS.0.@id NaN 4
REPAIR_BULLETINS.0.#text NaN Known target of the occult
REPAIR_BULLETINS.1.@id NaN 5
REPAIR_BULLETINS.1.#text NaN Not meant to be thrown int...