Recursive function without return statement-CodePudding

I have some data that is inserted into a nested dictionary. The data are created and could theoretically be endlessly deep. It could e.g. looks like this:

data = {'leaves': {'dark': {}, 'green': {'light': {}}, 'without': {'veins': {'blue': {}}}, '5': {}}}

For some clarification: In this small sample, it means that a certain plant has 'leaves', the 'leaves' are 'dark', 'green' and 'without'. The 'green' is 'light' in this example etc.

I want to unnest this dictionary and store every key, value combination into a tuple. That could for example look like this:

[('leaves', 'dark'), ('leaves', 'green'), ('green', 'light'), ('without', 'veins'), ('leaves', '5'), ('veines', 'blue')]

Note: order is not important. For those interested, these tuples are further manipulated and will end up in a knowledge graph.

I thought a recursive function would do the trick here, but my function works best without the restatement, a function without a return statement is just a simple loop. However, I cannot make it work with a simple loop.

edit: the doubles variable is a global list.

The function I wrote:

def undict(d):
    for key in d.keys():
        if isinstance(d[key], dict):
            doubles  = [(key, k) for k in d[key].keys()]
        undict(d[key]) # Normally: return undict(d[key])

Maybe can anyone offer some insights on how to make it truly recursive or use a simple loop? I am lost at this point.

CodePudding user response：

Your approach is pretty good!

However, note that you're using a global variable, doubles, rather than a local variable and a return statement, which would be cleaner.

To avoid issues with .append or .extend or = with lists, a very pythonic approach is to use a generator function, using keyword yield instead of keyword return.

data = {'leaves': {'dark': {}, 'green': {'light': {}}, 'without': {'veins': {'blue': {}}}, '5': {}}}

def undict_to_pairs(d):
    for k,v in d.items():
        if isinstance(v, dict):  # always true with your example data
            for subk in v:
                yield (k, subk)
            yield from undict_to_pairs(v)
        else:
            yield (k,v)          # this statement is never reached with your example data

print(list(undict_to_pairs(data)))
# [('leaves', 'dark'), ('leaves', 'green'), ('leaves', 'without'), ('leaves', '5'), ('green', 'light'), ('without', 'veins'), ('veins', 'blue')]

Note that with your example data, isinstance(v,dict) is always true. The else branch is never reached. So this shorter version would work too:

def undict_to_pairs(d):
    for k,v in d.items():
        for subk in v:
            yield (k, subk)
        yield from undict_to_pairs(v)

print(list(undict_to_pairs(data)))
# [('leaves', 'dark'), ('leaves', 'green'), ('leaves', 'without'), ('leaves', '5'), ('green', 'light'), ('without', 'veins'), ('veins', 'blue')]

Let me also suggest a different version, which is not what you asked for but looks more logical to me in regards to your data: generating long tuples instead of pairs. I removed isinstance(v, dict) from that version, since it appears the values in your data are always dicts.

def undict_to_tuples(d, acc = ()):
    if d == {}:
        yield acc
    else:
        for k,v in d.items():
            yield from undict_to_tuples(v, acc   (k,))

print(list(undict_to_tuples(data)))
# [('leaves', 'dark'), ('leaves', 'green', 'light'), ('leaves', 'without', 'veins', 'blue'), ('leaves', '5')]