How do you remove nested keys dynamically from Json using recursion?-CodePudding

I currently have a recursive function that removes ALL keys that match a pattern. Here is the background:

Example Json

{
    "results": [{
        "name": "john doe",
        "age": "100",
        "owned_cars": [{
            "make": "ford",
            "color": "white"
        }, {
            "make": "bmw",
            "color": "red"
        }],
        "wished_cars": [{
            "make": "honda"
        }, {
            "make": "toyota",
            "style": "sleek"
        }, {
            "style": "fat"
        }]
    }]
}

Here's the function:

def remove_all_keys_matching_value(d, keys_to_remove):
    if not isinstance(d, (dict, list)):
        return d
    if isinstance(d, list):
        return [remove_all_keys_matching_value(v, keys_to_remove) for v in d]
    return {k: remove_all_keys_matching_value(v, keys_to_remove) for k, v in d.items() if k not in keys_to_remove}

If I run the function with these keys to remove keys_to_remove = ('make', 'name') I'll get the following result:

{
    "results": [{
        "age": "100",
        "owned_cars": [{
            "color": "white"
        }, {
            "color": "red"
        }],
        "wished_cars": [{}, {
            "style": "sleek"
        }, {
            "style": "fat"
        }]
    }]
}

I want to adjust this code to be more targeted so it doesn't remove all instances of the key but rather takes into account the root value of the key/path if that makes sense.

So for example if I were to pass in a tuple containing (('owned_cars', 'make'), 'name') it would return:

{
    "results": [{
        "age": "100",
        "owned_cars": [{
            "color": "white"
        }, {
            "color": "red"
        }],
        "wished_cars": [{
            "make": "honda"
        }, {
            "make": "toyota",
            "style": "sleek"
        }, {
            "style": "fat"
        }]
    }]
}

I know I need to keep track of the root key somehow but am unsure how to fold this in. I would appreciate any help in solving this. I always struggle when the recursion gets this complex and would love to see how someone more experienced would approach it so I can improve.

While I am interested in the solution to this problem, I'm more interested in learning how to approach a problem like this? I understand whats happening at a high level in the recursive method but struggle when I need to start stepping through it. I don't know how to make the leap to adjusting the code to identify the root path.

CodePudding user response：

division of complexity

We could start with a remove function that takes any t and any number of paths -

def remove(t, *paths):
  for p in paths:
    t = remove1(t, p)
  return t

As you can see, it has a simple operation calling remove1(t, p) for all p in the provided paths. The final t is returned. This separates the complexity of removing a single path and removing many paths. We offload the majority of the work to remove1.

remove1

Your original code is pretty close. This remove1 takes any t and a single path.

If the path is empty, return t unmodified
(inductive) the path has at least one element. If t is a list, apply remove1(e, path) for all e of the list t
(inductive) that path has at least one element and t is not a list. If t is a dictionary -
- If the path has only one element, create a new dictionary with k assigned to the result of the sub-problem remove1(v, path) for all k,v of the dictionary t, excluding any k matching the path's element, path[0]
- (inductive) the path has at least two elements. Create a new dictionary with k assigned to the result sub-problem remove1(v, path[1:]) if k matches the first element of that path otherwise assign k to the result of the sub-problem remove1(v, path) for all k,v of the dictionary t.
(inductive) t is a non-list and t is a non-dictionary. Return t unmodified.

def remove1(t, path):
  if not path:
    return t
  elif isinstance(t, list):
    return list(remove1(e, path) for e in t)
  elif isinstance(t, dict):
    if len(path) == 1:
      return {k:remove1(v, path) for (k,v) in t.items() if not k == path[0] }
    else:
      return {k:remove1(v, path[1:]) if k == path[0] else remove1(v, path) for (k,v) in t.items()}
  else:
    return t

modification to the input data

I added another layer to your data so we can see precisely how remove is working -

data = {
  "results": [{
    "name": "john doe",
    "age": "100",
    "owned_cars": [{
      "additional_layer": {  # <-- additional layer
        "make": "foo",
        "color": "green"
      }
    }, {
      "make": "ford",
      "color": "white"
    }, {
      "make": "bmw",
      "color": "red"
    }],
    "wished_cars": [{
      "make": "honda"
    }, {
      "make": "toyota",
      "style": "sleek"
    }, {
      "style": "fat"
    }]
  }]
}

demo

Let's see remove work now -

import json
data = { ... }
new_data = remove(data, ("owned_cars", "make"), ("style",))
print(json.dumps(new_data, indent=2))

This says remove all "make" keys that are any descendant of "owned_cars" keys and remove all "style" keys -

{
  "results": [
    {
      "name": "john doe",
      "age": "100",
      "owned_cars": [
        {
          "additional_layer": {
                                  # <-- make removed
            "color": "green"
          }
        },
        {
                                  # <-- make removed
          "color": "white"
        },
        {
                                  # <-- make removed
          "color": "red"
        }
      ],
      "wished_cars": [
        {
          "make": "honda"      # <-- make not removed
        },
        {
          "make": "toyota"     # <-- make not removed
                               # <-- style removed
        },
        {}                     # <-- style removed
      ]
    }
  ]
}