Python DFS nested dictionary-CodePudding

I've written a function which should be able to search a nested dictionary, using DFS, to find a specific value. The recursive element seems to be working fine, however, when the base case should return True, it simply doesn't.

obj = {'a': [{'c':'d'}, {'e':'f'}],
       'b': [{'g':'h'}, {'i':'j'}]}

def obj_dfs(obj, target):
  
  if type(obj) == dict:
    for key, val in obj.items():
      obj_dfs(key, target)
      obj_dfs(val, target) 
  
  elif type(obj) in (list, tuple):
    for elem in obj:
      obj_dfs(elem, target)
  else:
    if obj == target:
      print(f"{obj} == {target}")
      return True
    else:
      print(f"{obj} != {target}")  
  return False 

obj_dfs(obj, 'j')

And the results. As you can see, the standard output "i==i" shows that this element was evaluated correctly but the return True statement isn't functioning as intended. I've verified that if I call obj_dfs(obj, 'j'), that experiences the same error.

a != j
c != j
d != j
e != j
f != j
b != j
g != j
h != j
i != j
j == j
False

Why is this? And how can I fix this?

CodePudding user response：

As the comments point out, you need to return the results of the recursive calls. Since you are just looking for a True/False match, you can pass the recursive calls into any() which will exit early with True if there is a match. The base case can simple be whether obj == target.

obj = {'a': [{'c':'d'}, {'e':'f'}],
       'b': [{'g':'h'}, {'i':'j'}]}

def obj_dfs(obj, target):
    if obj == target:
        return True

    if isinstance(obj, dict):
        return any(obj_dfs(v, target) for v in obj.items())
    
    elif isinstance(obj, (list, tuple)):
        return any(obj_dfs(l, target) for l in obj)

    return False
                   
obj_dfs(obj, 'i'), obj_dfs(obj, 'j'), obj_dfs(obj, 'd'), obj_dfs(obj, 'x')
# (True, True, True, False)

This allows for a three simple blocks. Notice we are checking for a tuple as well as a list in the last isinstance. This allows you to simply pass in the dict item()s rather than looping over keys and values independently.

Adding a print(obj) as the first line of the function will show the order in which you are traversing the data. For example obj_dfs(obj, 'j') will print:

{'a': [{'c': 'd'}, {'e': 'f'}], 'b': [{'g': 'h'}, {'i': 'j'}]}
('a', [{'c': 'd'}, {'e': 'f'}])
a
[{'c': 'd'}, {'e': 'f'}]
{'c': 'd'}
('c', 'd')
c
d
{'e': 'f'}
('e', 'f')
e
f
('b', [{'g': 'h'}, {'i': 'j'}])
b
[{'g': 'h'}, {'i': 'j'}]
{'g': 'h'}
('g', 'h')
g
h
{'i': 'j'}
('i', 'j')
i
j

CodePudding user response：

I made some edits to your code

obj = {'a': [{'c':'d'}, {'e':'f'}],
       'b': [{'g':'h'}, {'i':'j'}]}

def obj_dfs(obj, target):
  if type(obj) == dict:
    for key, val in obj.items():
      if(key==target):
        return val
      else:
        result=obj_dfs(val, target)
        if result!=None: return result
  elif type(obj) in (list, tuple):
    for elem in obj:
      result=obj_dfs(elem, target)
      if result!=None: return result
  else: 
    if obj==target: return True

print(obj_dfs(obj, 'i'))

I don't know why you would just return true instead of the value though, so I put it that if its a dictionary key it would return the value, instead it would return true, to show that it is found

CodePudding user response：

Expanding on my comment, try this, where we pass return values up the chain and always return True if a child returned True:

obj = {'a': [{'c':'d'}, {'e':'f'}],
       'b': [{'g':'h'}, {'i':'j'}]}
obj2 = {'a': [{'c':'d'}, {'e':'f'}],
       'b': [{'g':'h'}, {'g':'j'}]}

def obj_dfs(obj, target):
    if type(obj) == dict:
        for key, val in obj.items():
            keyret = obj_dfs(key, target)
            valueret = obj_dfs(val, target)
        if keyret is True or valueret is True:
            return True
        else:
            return False
    elif type(obj) in (list, tuple):
        rets = []
        for elem in obj:
            rets.append(obj_dfs(elem, target))
        if True in rets:
            return True
        else:
            return False
    else:
        if obj == target:
            print(f"{obj} == {target}")
            return True
        else:
            print(f"{obj} != {target}")
            return False

print(obj_dfs(obj, 'i'))
print(obj_dfs(obj2, 'i'))

CodePudding user response：

Recursion is a functional heritage and so using it with functional style yields the best results. This means decoupling concerns and pushing side effects to the fringe of your program. obj_dfs performs a depth-first traversal and mixes in search logic. And for purposes of debugging, includes a print side effect.

Decomposition results in functions that are easier to write, test, and reuse in various parts of our program. We'll start with a generic dfs for traversal -

def dfs(t, path = []):
  if isinstance(t, dict):
    for key, val in t.items():
      yield from dfs(val, [*path, key])
  elif isinstance(t, (list, tuple)):
    for key, val in enumerate(t):
      yield from dfs(val, [*path, key])
  else:
    yield path, t

obj = {'a': [{'c':'d'}, {'e':'f'}],
       'b': [{'g':'h'}, {'i':'j'}]}

for path, val in dfs(obj):
  print(path, val) # side effect decided by caller

['a', 0, 'c'] d
['a', 1, 'e'] f
['b', 0, 'g'] h
['b', 1, 'i'] j

The suggested solution and other answers here collapse the semantic difference between key and value, providing no differentiation on the particular match of target. Writing dfs as we did above, we can know which part of obj matched.

keys	values
['a', 0, 'c']	d
['a', 1, 'e']	f
['b', 0, 'g']	h
['b', 1, 'i']	j

has_value and has_key are easily defined in terms of dfs -

def has_value(t, target):
  for path, val in dfs(t):
    if val == target:
      return True
  return False

def has_key(t, target):
  for path, val in dfs(t):
    if target in path:
      return True
  return False

print(has_value(obj, "j"))   # True
print(has_key(obj, "j"))     # False

print(has_value(obj, "i"))   # False
print(has_key(obj, "i"))     # True