I'm trying to traverse an object that has arbitrarily deeply nested leaves, getting the path of each root node to leaf. An example of this might be:
class Field (object):
def __init__(self, name, subfields=None):
self.name = name
self.subfields = subfields
f1 = Field("username")
f2 = Field("ip_address")
f3 = Field("song_id")
f11 = Field("song_length")
f10 = Field("misc_attrs", subfields=[f3, f11])
f4 = Field("event_time")
f5 = Field("song_name", subfields=[f1,f2])
f6 = Field("user_song_upload", subfields=[f10])
input_list = [f4,f5,f6]
My goal is to produce a list output
of lists, where each list in output
represents a path from the root node to the leaf. So in the example above, we'd have the following output:
[["event_time"],["song_name","username"],["song_name","ip_address"],["user_song_upload","misc_attrs","song_id"],["user_song_upload","misc_attrs","song_length"]]
I have a function that will return all of the leaf nodes, but I'm having trouble keeping track of the complete path for each node. Here is what I have so far:
def get_leaf_paths(input_list, output = [], path=[]):
for x in input_list:
if not x.subfields:
# we have a leaf
path.append(x.name)
output.append(path)
# next line clears the leaf from our last path
path = path[:-1]
else:
# we don't have a leaf node, and need to iterate on the subfields
path.append(x.name)
get_leaf_paths(x.subfields, output, path)
return output
However, this results in paths being built incorrectly (e.g., two adjacent top-level nodes will end up in the same inner list in the output:
[['event_time'],
['song_name', 'username', 'user_song_upload', 'misc_attrs', 'song_id'],
['song_name', 'ip_address'],
['song_name', 'username', 'user_song_upload', 'misc_attrs', 'song_id'],
['song_name', 'username', 'user_song_upload', 'misc_attrs', 'song_length']]
CodePudding user response:
You can construct an appropriate base case to simplify the code. Try the following:
def get_leaf_paths(lst):
if not lst: # empty list or None
return [[]]
return [[f.name, *path] for f in lst for path in get_leaf_paths(f.subfields)]
print(get_leaf_paths(input_list))
In the code, the base case is "an empty list or None", i.e., when subfields
is passed to the function but subfields
is actually None
. This will be the case when the function is called at a leaf node.
Output:
[['event_time'], ['song_name', 'username'], ['song_name', 'ip_address'], ['user_song_upload', 'misc_attrs', 'song_id'], ['user_song_upload', 'misc_attrs', 'song_length']]
CodePudding user response:
If you need to apply this to a very deep structure and recursion (like @j1-lee suggest in their otherwise very clean implementation) is not an option, this works:
def get_leaf_paths(lst):
stack, l, i = [], lst, 0
while l and len(l) > i:
if l[i].subfields:
stack.append((l, i))
l, i = l[i].subfields, 0
else:
yield [x[n].name for x, n in stack] [l[i].name]
i = 1
if i >= len(l):
l, i = stack.pop()
i = 1
print(list(get_leaf_paths(input_list)))
Result:
[['event_time'], ['song_name', 'username'], ['song_name', 'ip_address'], ['user_song_upload', 'misc_attrs', 'song_id'], ['user_song_upload', 'misc_attrs', 'song_length']]