Home > Software engineering >  Splitting a nested dictionary into a list of single-key paths
Splitting a nested dictionary into a list of single-key paths

Time:06-21

So let's say I have a dictionary:

{"a": {"b": 1,
       "c": 2,
       "d": {"e": 3,
             "f": 4,
            }
      }
 "g": {"h": 5,
       "i": 6
      }
}

I've been trying to find a way to map this dictionary into a list of "paths" to each end-value, as follows:

[
{"a": {"b": 1}},
{"a": {"c": 2}},
{"a": {"d": {"e": 3}}},
{"a": {"d": {"f": 4}}},
{"g": {"h": 5}},
{"g": {"i": 6}}
]

Where each list entry is a single-key nested dictionary. I assume this can be done using some form of depth-first walk with recursion, but I'm not very familiar with programming recursive functions, and also don't know if that's even the best approach.

Any input would be appreciated.

CodePudding user response:

I agree with you that a recursion is a good approach. Here is what I would do. Using a generator (i.e. yield-ing values) has the advantage that we don't have to have a variable gathering the individual result items.

The variable _revpath contains the "path" of dict keys leading to a value, but in reversed order, because at the end of the recursion we want to create a nested dict from inner to outer dict.

test = {"a": {"b": 1,
       "c": 2,
       "d": {"e": 3,
             "f": 4,
            }   
      },  
 "g": {"h": 5,
       "i": 6
      }   
}

def walkdict(d, _revpath=[]):
    if isinstance(d, dict):
        for key, value in d.items():
            yield from walkdict(value, [key]   _revpath)
    else:
        for key in _revpath:
            d = {key: d}
        yield d

print(list(walkdict(test)))

CodePudding user response:

Use recursion to build the list of nodes you pass throught

def paths(values, parents=None):
    results = []
    parents = parents or []
    for k, v in values.items():
        if isinstance(v, int):
            results.append(keys_to_nested_dict([*parents, k], v))
        else:
            results.extend(paths(v, [*parents, k]))
    return results

Then when you reach an leaf, an int, transform that list of nodes, into a nested dict

def keys_to_nested_dict(keys, value):
    result = {}
    tmp = result
    for k in keys[:-1]:
        tmp[k] = {}
        tmp = tmp[k]
    tmp[keys[-1]] = value
    return result

print(keys_to_nested_dict(['a', 'b', 'c'], 1))  # {'a': {'b': {'c': 1}}}

x = {"a": {"b": 1, "c": 2, "d": {"e": 3, "f": 4, }},
     "g": {"h": 5, "i": 6}}

print(paths(x))
# [{'a': {'b': 1}}, {'a': {'c': 2}}, {'a': {'d': {'e': 3}}}, {'a': {'d': {'f': 4}}}, {'g': {'h': 5}}, {'g': {'i': 6}}]

CodePudding user response:

Not exactly the same output, but a little tweak can you a lot of efforts & complexity

 sample = {"a": {"b": 1,"c": 2,"d": {"e": 3,"f": 4}},"g": {"h": 5,"i": 6}}
 from pandas.io.json._normalize import nested_to_record    
 flat = nested_to_record(sample, sep='_')

Output

{'a_b': 1, 'a_c': 2, 'a_d_e': 3, 'a_d_f': 4, 'g_h': 5, 'g_i': 6}

Now, whenever you wish to know the possible paths, just iterate over the keys of this dictionary & split('_') will give you the entire path & associated value

  • Related