How to find Dictionary Key(s) from Value in a large nested dictionary of variable depth?-CodePudding

Say that I have a large dictionary full of nested values such as this:

large_dic ={
...
"key":{"sub-key1" :{"sub-key2": "Test"}},
"0key":{"0sub-key1": "0Test"},
"1key":{"1sub-key1":{"1sub-key2":{"1sub-key3":"1Test"}}}
...
}

What I would like to do is to be able to get for example from the final value:

"1Test"

the key(s) to access it, such as in this case:

large_dic["1key"]["1sub-key1"]["1sub-key2"]["1sub-key3"]

Thanks for the support.

Edit to add more infos: The dictionary trees I'm talking about are linear(YAML files converted into a python dictionary structure), there is never more than one key, the ending leaf values may not be unique.

CodePudding user response：

Since OP is looking for hierarchical keys instead
I made this class :

class PointingSlice:
    def __init__(self, obj, *slices) -> None:
        self.obj = obj
        self.slices = slices

    def __str__(self):
        return f"{str(self.obj)}{''.join(map(self._repr_slice, self.slices))}"

    def _repr_slice(self, sliced: slice):
        sqbrackets = "[{}]"
        if not isinstance(sliced, slice):
            return sqbrackets.format(repr(sliced))
        items = [sliced.start, sliced.stop, sliced.step]
        fn = lambda x: str() if x is None else str(x)
        return sqbrackets.format(":".join(map(fn, items)))

    def resolve(self):
        obj = self.obj
        for sliced in self.slices:
            obj = obj.__getitem__(sliced)
        return obj

and this function for instantiation :

def find_longest(mapping, key):
    keys = [key]
    value = mapping[key]
    while isinstance(value, dict):
        ((k, value),) = value.items()
        keys.append(k)
    return PointingSlice(mapping, *keys)

Example use:

print(find_longest(large_dic, "1key"))
# output:
# {'key': {'sub-key1': {'sub-key2': 'Test'}}, '0key': {'0sub-key1': '0Test'}, '1key': {'1sub-key1': {'1sub-key2': {'1sub-key3': '1Test'}}}}['1key']['1sub-key1']['1sub-key2']['1sub-key3']
# do note that it is the same thing as large_dic['1key']['1sub-key1']['1sub-key2']['1sub-key3']
print(find_longest(large_dic, "1key").resolve()) # 1Test

So I made some changes and now it supports additional repr options matching your exact use case :

class PointingSlice:
    def __init__(self, obj, *slices, object_name=None) -> None:
        self.obj = obj
        self.slices = slices
        self.object_name = object_name

    def __str__(self):
        return f"{self.object_name or str(self.obj)}{''.join(map(self._repr_slice, self.slices))}"

    def _repr_slice(self, sliced: slice):
        sqbrackets = "[{}]"
        if not isinstance(sliced, slice):
            return sqbrackets.format(repr(sliced))
        items = [sliced.start, sliced.stop, sliced.step]
        fn = lambda x: str() if x is None else str(x)
        return sqbrackets.format(":".join(map(fn, items)))

    def resolve(self):
        obj = self.obj
        for sliced in self.slices:
            obj = obj.__getitem__(sliced)
        return obj


large_dic = {
    "key": {"sub-key1": {"sub-key2": "Test"}},
    "0key": {"0sub-key1": "0Test"},
    "1key": {"1sub-key1": {"1sub-key2": {"1sub-key3": "1Test"}}},
}


def find_longest(mapping, key):
    keys = [key]
    value = mapping[key]
    while isinstance(value, dict):
        ((k, value),) = value.items()
        keys.append(k)
    return PointingSlice(mapping, *keys)


f = find_longest(large_dic, "1key")
f.object_name = "large_dic"  # for representational purposes, it works without this
print(f)  # large_dic['1key']['1sub-key1']['1sub-key2']['1sub-key3']
print(f.resolve())  # 1Test

CodePudding user response：

Yes, it's totally possible. Here's the function to get the deeply nested value:

def get_final_value(mapping, key):
    value = mapping[key]
    while isinstance(value, dict):
        (value,) = value.values()
    return value

Example use:

>>> get_final_value(large_dic, "key")  
'Test'
>>> get_final_value(large_dic, "0key") 
'0Test'
>>> get_final_value(large_dic, "1key") 
'1Test'
>>>

CodePudding user response：

There are numerous ways to achieve this. You might want to look up "prefix tree traversal" (or "trie traversal").

A simple recursive solution with poor memory efficiency could look like this:

def find_trie_leaf_path(trie: dict, leaf_value, trie_path: list[str] = []):
    for key, value in trie.items():
        if isinstance(value, dict):
            yield from find_trie_leaf_path(value, leaf_value, trie_path   [key])
        elif value == leaf_value:
            yield trie_path   [key]

large_dic = {
    "key": {"sub-key1": {"sub-key2": "Test"}},
    "0key": {"0sub-key1": "0Test"},
    "1key": {"1sub-key1": {"1sub-key2": {"1sub-key3": "Test"}}},
}


first_match = next(find_trie_leaf_path(large_dic, "Test"))
all_matches = list(find_trie_leaf_path(large_dic, "Test"))

This should work even if your trie is very wide. If it is very high, I'd rather use an iterative algorithm.

I want to point out, though, that prefix trees are usually used the other way round. If you find yourself needing this search a lot, you should consider a different data structure.

CodePudding user response：

Can the parent keys be deduced from your final value in any way or is the tree structure rather random? If latter is the case then you'll probably just end up searching your tree until you find your value, what path search algorithm you choose for that again depends on the tree structure you have. As already asked in the comments, does each node only have one other node or is it binary or can it have many child nodes?