Home > Software design >  removing null/empty values in lists of a json object in python recursively
removing null/empty values in lists of a json object in python recursively

Time:04-04

I have a json object (json string) which has values like this:

[
   {
      "id": 1,
      "object_k_id": "",
      "object_type": "report",
      "object_meta": {
         "source_id": 0,
         "report": "Customers"
      },
      "description": "Daily metrics for all customers",
      "business_name": "",
      "business_logic": "",
      "owners": [
         "[email protected]",
          null
      ],
      "stewards": [
         "[email protected]",
         ''
      ],
      "verified_use_cases": [
         null,
         null,
         "c4a48296-fd92-3606-bf84-99aacdf22a20",
         null
      ],
      "classifications": [
         null
      ],
      "domains": []
   }
]

Bu the final format I want is something that has removed the nulls and the empty list items: something like this:

[
   {
      "id": 1,
      "object_k_id": "",
      "object_type": "report",
      "object_meta": {
         "source_id": 0,
         "report": "Customers"
      },
      "description": "Daily metrics for all customers",
      "business_name": "",
      "business_logic": "",
      "owners": [
         "[email protected]"
      ],
      "stewards": [
         "[email protected]"
      ],
      "verified_use_cases": [
         "c4a48296-fd92-3606-bf84-99aacdf22a20"
      ],
      "classifications": [],
      "domains": []
   }
]

I want the output to exclude nulls, empty strings and make it look more clean. I need to do this recursively for all the lists in all the jsons I have.

Even more than recursive, it would be helpful if I can do it at one stretch rather than looping through each element.

I need to clean only the lists though.

Can anyone please help me with this? Thanks in advance

CodePudding user response:

import json


def recursive_dict_clean(d):
    for k, v in d.items():
        if isinstance(v, list):
            v[:] = [i for i in v if i]
        if isinstance(v, dict):
            recursive_dict_lookup(v)


data = json.loads("""[{
    "id": 1,
    "object_k_id": "",
    "object_type": "report",
    "object_meta": {
        "source_id": 0,
        "report": "Customers"
    },
    "description": "Daily metrics for all customers",
    "business_name": "",
    "business_logic": "",
    "owners": [
        "[email protected]",
        null
    ],
    "stewards": [
        "[email protected]"
    ],
    "verified_use_cases": [
        null,
        null,
        "c4a48296-fd92-3606-bf84-99aacdf22a20",
        null
    ],
    "classifications": [
        null
    ],
    "domains": []
}]""")


for d in data:
    recursive_dict_clean(d)

print(data):
[{'id': 1,
  'object_k_id': '',
  'object_type': 'report',
  'object_meta': {'source_id': 0, 'report': 'Customers'},
  'description': 'Daily metrics for all customers',
  'business_name': '',
  'business_logic': '',
  'owners': ['[email protected]'],
  'stewards': ['[email protected]'],
  'verified_use_cases': ['c4a48296-fd92-3606-bf84-99aacdf22a20'],
  'classifications': [],
  'domains': []}]

P.S.: Your json string is not valid.

CodePudding user response:

You can convert your json to dict then use the function below and convert it to json again:

def clean_dict(input_dict):
    output = {}
    for key, value in input_dict.items():
        if isinstance(value, dict):
            output[key] = clean_dict(value)
        elif isinstance(value, list):
            output[key] = []
            for item in value:
                if isinstance(value, dict):
                    output[key].append(clean_dict(item))
                elif value not in [None, '']:
                    output[key].append(item)
        else:
            output[key] = value
    return output

Thanks to N.O

CodePudding user response:

You can use the inbuilt object_pairs_hook to parse the data as you decode it from your string.

https://docs.python.org/3/library/json.html#json.load

This function runs ever time the decoder might call dict() and removes all None objects from lists as it goes using a simple list comprehension, otherwise leaving the data alone and letting the decoder do it's thing.

#!/usr/bin/env python3
import json
data_string = """[
   {
      "id": 1,
      "object_k_id": "",
      "object_type": "report",
      "object_meta": {
         "source_id": 0,
         "report": "Customers"
      },
      "description": "Daily metrics for all customers",
      "business_name": "",
      "business_logic": "",
      "owners": [
         "[email protected]",
          null
      ],
      "stewards": [
         "[email protected]",
         ""
      ],
      "verified_use_cases": [
         null,
         null,
         "c4a48296-fd92-3606-bf84-99aacdf22a20",
         null
      ],
      "classifications": [
         null
      ],
      "domains": []
   }
]"""

def json_hook(obj):
    return_obj = {}
    for k, v in obj:
        if isinstance(v, list):
            v = [x for x in v if x is not None]

        return_obj[k] = v

    return return_obj

data = json.loads(data_string, object_pairs_hook=json_hook)

print(json.dumps(data, indent=4))

Result:

[
    {
        "id": 1,
        "object_k_id": "",
        "object_type": "report",
        "object_meta": {
            "source_id": 0,
            "report": "Customers"
        },
        "description": "Daily metrics for all customers",
        "business_name": "",
        "business_logic": "",
        "owners": [
            "[email protected]"
        ],
        "stewards": [
            "[email protected]",
            ""
        ],
        "verified_use_cases": [
            "c4a48296-fd92-3606-bf84-99aacdf22a20"
        ],
        "classifications": [],
        "domains": []
    }
]

in your example you remove the "" value from stewards, if you want that behaviour, you can replace is not None with not in (None, "").. but it seemed like that might've been a mistake since you left empty strings in other places.

  • Related