I have some data that is inserted into a nested dictionary. The data are created and could theoretically be endlessly deep. It could e.g. looks like this:
data = {'leaves': {'dark': {}, 'green': {'light': {}}, 'without': {'veins': {'blue': {}}}, '5': {}}}
For some clarification: In this small sample, it means that a certain plant has 'leaves', the 'leaves' are 'dark', 'green' and 'without'. The 'green' is 'light' in this example etc.
I want to unnest this dictionary and store every key, value combination into a tuple. That could for example look like this:
[('leaves', 'dark'), ('leaves', 'green'), ('green', 'light'), ('without', 'veins'), ('leaves', '5'), ('veines', 'blue')]
Note: order is not important. For those interested, these tuples are further manipulated and will end up in a knowledge graph.
I thought a recursive function would do the trick here, but my function works best without the restatement, a function without a return statement is just a simple loop. However, I cannot make it work with a simple loop.
edit: the doubles variable is a global list.
The function I wrote:
def undict(d):
for key in d.keys():
if isinstance(d[key], dict):
doubles = [(key, k) for k in d[key].keys()]
undict(d[key]) # Normally: return undict(d[key])
Maybe can anyone offer some insights on how to make it truly recursive or use a simple loop? I am lost at this point.
CodePudding user response:
Your approach is pretty good!
However, note that you're using a global variable, doubles
, rather than a local variable and a return
statement, which would be cleaner.
To avoid issues with .append
or .extend
or =
with lists, a very pythonic approach is to use a generator function, using keyword yield
instead of keyword return
.
data = {'leaves': {'dark': {}, 'green': {'light': {}}, 'without': {'veins': {'blue': {}}}, '5': {}}}
def undict_to_pairs(d):
for k,v in d.items():
if isinstance(v, dict): # always true with your example data
for subk in v:
yield (k, subk)
yield from undict_to_pairs(v)
else:
yield (k,v) # this statement is never reached with your example data
print(list(undict_to_pairs(data)))
# [('leaves', 'dark'), ('leaves', 'green'), ('leaves', 'without'), ('leaves', '5'), ('green', 'light'), ('without', 'veins'), ('veins', 'blue')]
Note that with your example data, isinstance(v,dict)
is always true. The else
branch is never reached. So this shorter version would work too:
def undict_to_pairs(d):
for k,v in d.items():
for subk in v:
yield (k, subk)
yield from undict_to_pairs(v)
print(list(undict_to_pairs(data)))
# [('leaves', 'dark'), ('leaves', 'green'), ('leaves', 'without'), ('leaves', '5'), ('green', 'light'), ('without', 'veins'), ('veins', 'blue')]
Let me also suggest a different version, which is not what you asked for but looks more logical to me in regards to your data: generating long tuples instead of pairs. I removed isinstance(v, dict)
from that version, since it appears the values in your data are always dicts.
def undict_to_tuples(d, acc = ()):
if d == {}:
yield acc
else:
for k,v in d.items():
yield from undict_to_tuples(v, acc (k,))
print(list(undict_to_tuples(data)))
# [('leaves', 'dark'), ('leaves', 'green', 'light'), ('leaves', 'without', 'veins', 'blue'), ('leaves', '5')]