I currently have a recursive function that removes ALL keys that match a pattern. Here is the background:
Example Json
{
"results": [{
"name": "john doe",
"age": "100",
"owned_cars": [{
"make": "ford",
"color": "white"
}, {
"make": "bmw",
"color": "red"
}],
"wished_cars": [{
"make": "honda"
}, {
"make": "toyota",
"style": "sleek"
}, {
"style": "fat"
}]
}]
}
Here's the function:
def remove_all_keys_matching_value(d, keys_to_remove):
if not isinstance(d, (dict, list)):
return d
if isinstance(d, list):
return [remove_all_keys_matching_value(v, keys_to_remove) for v in d]
return {k: remove_all_keys_matching_value(v, keys_to_remove) for k, v in d.items() if k not in keys_to_remove}
If I run the function with these keys to remove keys_to_remove = ('make', 'name')
I'll get the following result:
{
"results": [{
"age": "100",
"owned_cars": [{
"color": "white"
}, {
"color": "red"
}],
"wished_cars": [{}, {
"style": "sleek"
}, {
"style": "fat"
}]
}]
}
I want to adjust this code to be more targeted so it doesn't remove all instances of the key but rather takes into account the root value of the key/path if that makes sense.
So for example if I were to pass in a tuple containing (('owned_cars', 'make'), 'name')
it would return:
{
"results": [{
"age": "100",
"owned_cars": [{
"color": "white"
}, {
"color": "red"
}],
"wished_cars": [{
"make": "honda"
}, {
"make": "toyota",
"style": "sleek"
}, {
"style": "fat"
}]
}]
}
I know I need to keep track of the root key somehow but am unsure how to fold this in. I would appreciate any help in solving this. I always struggle when the recursion gets this complex and would love to see how someone more experienced would approach it so I can improve.
While I am interested in the solution to this problem, I'm more interested in learning how to approach a problem like this? I understand whats happening at a high level in the recursive method but struggle when I need to start stepping through it. I don't know how to make the leap to adjusting the code to identify the root path.
CodePudding user response:
division of complexity
We could start with a remove
function that takes any t
and any number of paths
-
def remove(t, *paths):
for p in paths:
t = remove1(t, p)
return t
As you can see, it has a simple operation calling remove1(t, p)
for all p
in the provided paths. The final t
is returned. This separates the complexity of removing a single path and removing many paths. We offload the majority of the work to remove1
.
remove1
Your original code is pretty close. This remove1
takes any t
and a single path
.
- If the
path
is empty, returnt
unmodified - (inductive) the
path
has at least one element. Ift
is a list, applyremove1(e, path)
for alle
of the listt
- (inductive) that
path
has at least one element andt
is not a list. Ift
is a dictionary -- If the
path
has only one element, create a new dictionary withk
assigned to the result of the sub-problemremove1(v, path)
for allk,v
of the dictionaryt
, excluding anyk
matching the path's element,path[0]
- (inductive) the
path
has at least two elements. Create a new dictionary withk
assigned to the result sub-problemremove1(v, path[1:])
ifk
matches the first element of thatpath
otherwise assignk
to the result of the sub-problemremove1(v, path)
for allk,v
of the dictionaryt
.
- If the
- (inductive)
t
is a non-list andt
is a non-dictionary. Returnt
unmodified.
def remove1(t, path):
if not path:
return t
elif isinstance(t, list):
return list(remove1(e, path) for e in t)
elif isinstance(t, dict):
if len(path) == 1:
return {k:remove1(v, path) for (k,v) in t.items() if not k == path[0] }
else:
return {k:remove1(v, path[1:]) if k == path[0] else remove1(v, path) for (k,v) in t.items()}
else:
return t
modification to the input data
I added another layer to your data so we can see precisely how remove
is working -
data = {
"results": [{
"name": "john doe",
"age": "100",
"owned_cars": [{
"additional_layer": { # <-- additional layer
"make": "foo",
"color": "green"
}
}, {
"make": "ford",
"color": "white"
}, {
"make": "bmw",
"color": "red"
}],
"wished_cars": [{
"make": "honda"
}, {
"make": "toyota",
"style": "sleek"
}, {
"style": "fat"
}]
}]
}
demo
Let's see remove
work now -
import json
data = { ... }
new_data = remove(data, ("owned_cars", "make"), ("style",))
print(json.dumps(new_data, indent=2))
This says remove all "make"
keys that are any descendant of "owned_cars"
keys and remove all "style"
keys -
{
"results": [
{
"name": "john doe",
"age": "100",
"owned_cars": [
{
"additional_layer": {
# <-- make removed
"color": "green"
}
},
{
# <-- make removed
"color": "white"
},
{
# <-- make removed
"color": "red"
}
],
"wished_cars": [
{
"make": "honda" # <-- make not removed
},
{
"make": "toyota" # <-- make not removed
# <-- style removed
},
{} # <-- style removed
]
}
]
}