Recently, I was experimenting with writing a function to find a primitive value anywhere within an arbitrarily deeply nested sequence, and return the path taken to get there (as a list of indices inside each successive nested sequence, in order). I encountered a very unexpected obstacle: the function was finding the result, but not returning it! Instead of the correct output, the function kept returning the output which should only have been produced when attempting to find an item not in the sequence.
By placing print
statements at various points in the function, I found that the problem was that after the recursive call which actually found the item returned, others which did not find the item were also returning, and evidently later in time than the one that found it. This meant that the final result was getting reset to the 'fail' value from the 'success' value unless the 'success' value was the last thing to be encountered.
I tried fixing this by putting an extra conditional inside the function to return early in the success case, trying to preempt the additional, unnecessary recursive calls which were causing the incorrect final result. Now, this is where I ran into the root cause of the problem:
There is no way of knowing which recursive call (if any) will find the item beforehand, and once one of them does find it, it has no way of 'communicating' with the others!
The only way I could come up with of avoiding this deeper issue was to completely refactor the function to 'set' a variable outside itself with the 'success' output if and only if the 'success' condition is encountered. The external, global variable starts out set to the 'failed to find item in sequence' value, and is not reset except in the 'success' case. All the other recursive calls just return
without doing anything. This seems very ugly and inefficient, but it does work.
FIRST ATTEMPT
# ITERATIVE/RECURSIVE SEQUENCE TRAVERSER (First Attempt)
# Works on 'result1' but not on 'result2'
# Searches for 'item' in sequence (list or tuple) S, and returns a tuple
# containing the indices (in order of increasing depth) at which the item
# can be found, plus the depth in S at which 'item' was found.
# If the item is *not* found, returns a tuple containing an empty list and -1
def traverse(S, item, indices=[], atDepth=0):
# If the sequence is empty, return the 'item not found' result
if not S:
return ([], -1)
else:
# For each element in the sequence (breadth-first)
for i in range(len(S)):
# Success condition base case: found the item!
if S[i] == item:
return (indices [i], atDepth)
# Recursive step (depth-first): enter nested sequence
# and repeat procedure from beginning
elif type(S[i]) in (list, tuple):
return traverse(S[i], item, indices [i], atDepth 1)
# Fail condition base case: searched the entire length
# and depth of the sequence and didn't find the item, so
# return the 'item not found' result
else:
print("We looked everywhere but didn't find " str(item) " in " str(S) ".")
return ([], -1)
L = [0, 1, 2, [3, (4, 5, [6, 6.25, 6.5, 6.75, 7])], [[8, ()]], (([9], ), 10)]
result1 = traverse(L, 7)
result2 = traverse(L, 9)
print("-------------------------------------------")
print(result1)
print("-------------------------------------------")
print(result2)
SECOND ATTEMPT
# ITERATIVE/RECURSIVE SEQUENCE TRAVERSER (Second Attempt)
# Does not work on either test case
# Searches for 'item' in sequence (list or tuple) S, and returns a tuple
# containing the indices (in order of increasing depth) at which the item
# can be found, plus the depth in S at which 'item' was found.
# If the item is *not* found, returns a tuple containing an empty list and -1
def traverse(S, item, indices=[], atDepth=0, returnValue=None):
# If the sequence is empty, return the 'item not found' result
if not S:
print("Sequence S is empty.")
return ([], -1)
# --- ATTEMPTED FIX:
# If the item is found before the end of S is reached,
# do not perform additional searches. In addition to being
# inefficient, doing extra steps would cause incorrect false
# negatives for the item being in S.
# --- DOES NOT WORK: the underlying issue is that the multiple recursive
# calls generated at the same time can't communicate with each other,
# so the others don't 'know' if one of them already found the item.
elif returnValue:
print("Inside 'elif' statement!")
return returnValue
else:
# For each element in the sequence (breadth-first)
for i in range(len(S)):
# Success condition base case: found the item!
if S[i] == item:
# Return the depth and index at that depth of the item
print("--- Found item " str(item) " at index path " str(indices) " in current sequence")
returnValue2 = (indices [i], atDepth)
print("--- Item " str(item) " is at index path " str(returnValue2) " in S, SHOULD RETURN")
#return returnValue2 # THIS DIDN'T FIX THE PROBLEM
#break # NEITHER DID THIS
# Recursive step (depth-first): enter nested sequence
# and repeat procedure from beginning
elif type(S[i]) in (list, tuple):
# CANNOT USE 'return' BEFORE RECURSIVE CALL, as it would cause any items
# in the outer sequence which come after the first occurrence of a nested
# sequence to be missed (i.e. the item could exist in S, but if it is
# after the first nested sequence, it won't be found)
traverse(S[i], item, indices [i], atDepth 1, returnValue) # CAN'T USE 'returnValue2' HERE (out of scope);
# so parameter can't be updated in 'if' condition
# Fail condition base case: searched the entire length
# and depth of the sequence and didn't find the item, so
# return the 'item not found' result
else:
print("We looked everywhere but didn't find " str(item) " in " str(S) ".")
return ([], -1)
L = [0, 1, 2, [3, (4, 5, [6, 6.25, 6.5, 6.75, 7])], [[8, ()]], (([9], ), 10)]
result1 = traverse(L, 7)
result2 = traverse(L, 9)
print("-------------------------------------------")
print(result1)
print("-------------------------------------------")
print(result2)
THIRD AND FINAL ATTEMPT -- Working, but not ideal!
# ITERATIVE/RECURSIVE SEQUENCE TRAVERSER (Third Attempt)
# This 'kludge' is ** HIDEOUSLY UGLY **, but it works!
# Searches for 'item' in sequence (list or tuple) S, and generates a tuple
# containing the indices (in order of increasing depth) at which the item
# can be found, plus the depth in S at which 'item' was found.
# If the item is *not* found, returns nothing (implicitly None)
# The results of calling the function are obtained via external global variables.
# This 3rd version of 'traverse' is thus actually a void function,
# and relies on altering the global state instead of producing an output.
# ----- WORKAROUND: If the result is found, have the recursive call that found it
# send it to global scope and use this global variable as the final result of calling
# the 'traverse' function.
# Initialize the global variables to the "didn't find the item" result,
# so the result will still be correct if the item actually isn't in the sequence.
globalVars = {'result1': ([], -1), 'result2': ([], -1)}
def traverse(S, item, send_output_to_var, indices=[], atDepth=0):
# If the sequence is empty, return *without* doing anything to the global variable.
# It is already initialized to the "didn't find item" result.
if not S:
return
else:
# For each element in the sequence (breadth-first)
for i in range(len(S)):
# Success condition base case: found the item!
if S[i] == item:
# Set the global variable to the index path of 'item' in 'S'.
globalVars[send_output_to_var] = (indices [i], atDepth)
# No need to keep on doing unnecessary work!
return
# Recursive step (depth-first): enter nested sequence
# and repeat procedure from beginning
elif type(S[i]) in (list, tuple):
# Don't use 'return' before the recursive call, or it will miss items
# in the outer sequence after a nested sequence is encountered.
traverse(S[i], item, send_output_to_var, indices [i], atDepth 1)
# Fail condition base case: searched the entire length
# and depth of the sequence and didn't find the item.
else:
# Return *without* setting the global variable, as it is
# already initialized to the "didn't find item" result.
return
L = [0, 1, 2, [3, (4, 5, [6, 6.25, 6.5, 6.75, 7])], [[8, ()]], (([9], ), 10)]
traverse(L, 7, 'result1')
traverse(L, 9, 'result2')
print("-------------------------------------------")
print(globalVars['result1'])
print("-------------------------------------------")
print(globalVars['result2'])
I was wondering if I'm missing something and there is in fact a way of making this work without the use of external variables. The best possible solution would be somehow 'shutting down' all the other recursive calls as soon as one of them returns the success result, but I don't believe this is possible (I'd love to be wrong about this!). Or maybe some kind of 'priority queue' which delays the return
of the 'success' case recursive call (if it exists) until after all the 'fail' case recursive calls have returned?
I looked at this similar question: Recursively locate nested dictionary containing a target key and value but although the accepted answer here https://stackoverflow.com/a/59538362/18248018 by ggorlen solved OP's problem and even mentions what seems to be this exact issue ("matched result isn't being passed up the call stack correctly"), it is tailored towards performing a specific task, and doesn't offer the insight I'm looking for into the more general case.
CodePudding user response:
Your first attempt is almost perfect, the only mistake is that you return the result of searching through the first list/tuple at the current depth, regardless of whether the item
was found or not. Instead, you need to check for a positive result, and only return if it is one. That way you keep iterating through the current depth until you either find the item
or it is not found at all.
So you need to change:
return traverse(S[i], item, indices [i], atDepth 1)
to something like:
t = traverse(S[i], item, indices [i], atDepth 1)
if t != ([], -1):
return t
Full code:
def traverse(S, item, indices=[], atDepth=0):
# If the sequence is empty, return the 'item not found' result
if not S:
return ([], -1)
else:
# For each element in the sequence (breadth-first)
for i in range(len(S)):
# Success condition base case: found the item!
if S[i] == item:
return (indices [i], atDepth)
# Recursive step (depth-first): enter nested sequence
# and repeat procedure from beginning
elif type(S[i]) in (list, tuple):
t = traverse(S[i], item, indices [i], atDepth 1)
if t != ([], -1):
return t
# Fail condition base case: searched the entire length
# and depth of the sequence and didn't find the item, so
# return the 'item not found' result
else:
print("We looked everywhere but didn't find " str(item) " in " str(S) ".")
return ([], -1)
Output for your two test cases:
>>> traverse(L, 7)
([3, 1, 2, 4], 3)
>>> traverse(L, 9)
We looked everywhere but didn't find 9 in [6, 6.25, 6.5, 6.75, 7].
We looked everywhere but didn't find 9 in (4, 5, [6, 6.25, 6.5, 6.75, 7]).
We looked everywhere but didn't find 9 in [3, (4, 5, [6, 6.25, 6.5, 6.75, 7])].
We looked everywhere but didn't find 9 in [8, ()].
We looked everywhere but didn't find 9 in [[8, ()]].
([5, 0, 0, 0], 3)
Note as pointed out by @FreddyMcloughlan, atDepth
is simply the length of the returned list minus 1. So you can remove that parameter from the function call and just use:
def traverse(S, item, indices=[]):
# If the sequence is empty, return the 'item not found' result
if not S:
return ([], -1)
else:
# For each element in the sequence (breadth-first)
for i in range(len(S)):
# Success condition base case: found the item!
if S[i] == item:
return (indices [i], len(indices))
# Recursive step (depth-first): enter nested sequence
# and repeat procedure from beginning
elif type(S[i]) in (list, tuple):
t = traverse(S[i], item, indices [i])
if t != ([], -1):
return t
# Fail condition base case: searched the entire length
# and depth of the sequence and didn't find the item, so
# return the 'item not found' result
else:
print("We looked everywhere but didn't find " str(item) " in " str(S) ".")
return ([], -1)