I have a large nested JSON (structure snippet below) that I am extracting the key/value pairs for keys FLOODING
, date
, and filename
. However, all of the dictionaries do not have the key/value pair for filename
and I want to omit those before or when extracting.
74': { '18': { 'FLOODING': True,
'FULL-DATA-COVERAGE': True,
'date': '2019-05-03'},
'19': { 'FLOODING': True,
'FULL-DATA-COVERAGE': True,
'date': '2019-05-06'},
'2': { 'FLOODING': False,
'FULL-DATA-COVERAGE': True,
'date': '2019-03-02',
'filename': 'S2_2019-03-02'},
'20': { 'FLOODING': True,
'FULL-DATA-COVERAGE': False,
'date': '2019-05-08'},
…
I have a function (see code below) that works nicely to extract the information for a desired key input. This results in extracted arrays of equal lengths when all of the dictionaries in the JSON have all of the same keys. It does not when some of the dictionaries have missing keys. Therefore, I would like to omit the dictionaries with missing keys but am stumped as how to achieve this. It seems likely that this omission should occur before extracting the information and thus saving the JSON with the omitted dictionaries as new file.
def json_extract(obj, key):
"""Recursively fetch values from nested JSON."""
arr = []
def extract(obj, arr, key):
"""Recursively search for values of key in JSON tree."""
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, (dict, list)):
extract(v, arr, key)
elif k == key:
arr.append(v)
elif isinstance(obj, list):
for item in obj:
extract(item, arr, key)
return arr
values = extract(obj, arr, key)
return values
CodePudding user response:
It looks like your data has some fixed structure, but the recursive code for a general case can deal with any data structure no matter how deeply are the dicts and lists nested. The disadvantage is then it has no idea where in the data structure it is currently operating.
If my assumption is true a simple non-recursive code can walk through the obj
:
for d1 in obj.values(): # top level is a dict
for d2 in d1.values(): # level below is also a dict
if 'filename' not in d2:
continue # disregard this one, it is incomplete
if key in d2:
arr.append(d2[key]) # extract value