I have a json object (json string) which has values like this:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"[email protected]",
null
],
"stewards": [
"[email protected]",
''
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}
]
Bu the final format I want is something that has removed the nulls and the empty list items: something like this:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"[email protected]"
],
"stewards": [
"[email protected]"
],
"verified_use_cases": [
"c4a48296-fd92-3606-bf84-99aacdf22a20"
],
"classifications": [],
"domains": []
}
]
I want the output to exclude nulls, empty strings and make it look more clean. I need to do this recursively for all the lists in all the jsons I have.
Even more than recursive, it would be helpful if I can do it at one stretch rather than looping through each element.
I need to clean only the lists though.
Can anyone please help me with this? Thanks in advance
CodePudding user response:
import json
def recursive_dict_clean(d):
for k, v in d.items():
if isinstance(v, list):
v[:] = [i for i in v if i]
if isinstance(v, dict):
recursive_dict_lookup(v)
data = json.loads("""[{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"[email protected]",
null
],
"stewards": [
"[email protected]"
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}]""")
for d in data:
recursive_dict_clean(d)
print(data):
[{'id': 1,
'object_k_id': '',
'object_type': 'report',
'object_meta': {'source_id': 0, 'report': 'Customers'},
'description': 'Daily metrics for all customers',
'business_name': '',
'business_logic': '',
'owners': ['[email protected]'],
'stewards': ['[email protected]'],
'verified_use_cases': ['c4a48296-fd92-3606-bf84-99aacdf22a20'],
'classifications': [],
'domains': []}]
P.S.: Your json string is not valid.
CodePudding user response:
You can convert your json
to dict
then use the function
below and convert it to json
again:
def clean_dict(input_dict):
output = {}
for key, value in input_dict.items():
if isinstance(value, dict):
output[key] = clean_dict(value)
elif isinstance(value, list):
output[key] = []
for item in value:
if isinstance(value, dict):
output[key].append(clean_dict(item))
elif value not in [None, '']:
output[key].append(item)
else:
output[key] = value
return output
Thanks to N.O
CodePudding user response:
You can use the inbuilt object_pairs_hook
to parse the data as you decode it from your string.
https://docs.python.org/3/library/json.html#json.load
This function runs ever time the decoder might call dict()
and removes all None
objects from lists as it goes using a simple list comprehension, otherwise leaving the data alone and letting the decoder do it's thing.
#!/usr/bin/env python3
import json
data_string = """[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"[email protected]",
null
],
"stewards": [
"[email protected]",
""
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}
]"""
def json_hook(obj):
return_obj = {}
for k, v in obj:
if isinstance(v, list):
v = [x for x in v if x is not None]
return_obj[k] = v
return return_obj
data = json.loads(data_string, object_pairs_hook=json_hook)
print(json.dumps(data, indent=4))
Result:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"[email protected]"
],
"stewards": [
"[email protected]",
""
],
"verified_use_cases": [
"c4a48296-fd92-3606-bf84-99aacdf22a20"
],
"classifications": [],
"domains": []
}
]
in your example you remove the ""
value from stewards
, if you want that behaviour, you can replace is not None
with not in (None, "")
.. but it seemed like that might've been a mistake since you left empty strings in other places.