Home > other >  Traversing a nested dict-like structure, keeping track of each path from root to leaf (Python)
Traversing a nested dict-like structure, keeping track of each path from root to leaf (Python)

Time:11-06

I'm trying to traverse an object that has arbitrarily deeply nested leaves, getting the path of each root node to leaf. An example of this might be:

class Field (object):
    def __init__(self, name, subfields=None):
        self.name = name
        self.subfields = subfields


f1 = Field("username")
f2 = Field("ip_address")
f3 = Field("song_id")
f11 = Field("song_length")
f10 = Field("misc_attrs", subfields=[f3, f11])

f4 = Field("event_time")
f5 = Field("song_name", subfields=[f1,f2])
f6 = Field("user_song_upload", subfields=[f10])

input_list = [f4,f5,f6]

My goal is to produce a list output of lists, where each list in output represents a path from the root node to the leaf. So in the example above, we'd have the following output:

[["event_time"],["song_name","username"],["song_name","ip_address"],["user_song_upload","misc_attrs","song_id"],["user_song_upload","misc_attrs","song_length"]]

I have a function that will return all of the leaf nodes, but I'm having trouble keeping track of the complete path for each node. Here is what I have so far:

def get_leaf_paths(input_list, output = [], path=[]):
    for x in input_list:
        if not x.subfields:
            # we have a leaf
            path.append(x.name)
            output.append(path) 
            # next line clears the leaf from our last path
            path = path[:-1]

        else:
            # we don't have a leaf node, and need to iterate on the subfields
            path.append(x.name)
            get_leaf_paths(x.subfields, output, path)
    return output

However, this results in paths being built incorrectly (e.g., two adjacent top-level nodes will end up in the same inner list in the output:

[['event_time'],
 ['song_name', 'username', 'user_song_upload', 'misc_attrs', 'song_id'],
 ['song_name', 'ip_address'],
 ['song_name', 'username', 'user_song_upload', 'misc_attrs', 'song_id'],
 ['song_name', 'username', 'user_song_upload', 'misc_attrs', 'song_length']]

CodePudding user response:

You can construct an appropriate base case to simplify the code. Try the following:

def get_leaf_paths(lst):
    if not lst: # empty list or None
        return [[]]
    return [[f.name, *path] for f in lst for path in get_leaf_paths(f.subfields)]

print(get_leaf_paths(input_list))

In the code, the base case is "an empty list or None", i.e., when subfields is passed to the function but subfields is actually None. This will be the case when the function is called at a leaf node.

Output:

[['event_time'], ['song_name', 'username'], ['song_name', 'ip_address'], ['user_song_upload', 'misc_attrs', 'song_id'], ['user_song_upload', 'misc_attrs', 'song_length']]

CodePudding user response:

If you need to apply this to a very deep structure and recursion (like @j1-lee suggest in their otherwise very clean implementation) is not an option, this works:

def get_leaf_paths(lst):
    stack, l, i = [], lst, 0
    while l and len(l) > i:
        if l[i].subfields:
            stack.append((l, i))
            l, i = l[i].subfields, 0
        else:
            yield [x[n].name for x, n in stack]   [l[i].name]
            i  = 1
        if i >= len(l):
            l, i = stack.pop()
            i  = 1


print(list(get_leaf_paths(input_list)))

Result:

[['event_time'], ['song_name', 'username'], ['song_name', 'ip_address'], ['user_song_upload', 'misc_attrs', 'song_id'], ['user_song_upload', 'misc_attrs', 'song_length']]
  • Related