Home > Software engineering >  Best way to get nested dictionary items
Best way to get nested dictionary items

Time:03-15

The topic is not new and has already been discussed in multiple posts (links at the bottom). However, I felt like the resources are scattered and it is not always clear what the the best approach is. I would also like to introduce some constraints to clearly define the behaviour that I am expecting.

Say we have a nested dictionary with any number of items and arbitrary depth:

d = {"a": {"b": {"c" : 0}},
     "b": {"c" : 1},
     "c": 2}

What is the best way to get its items?

The naive approach is quite cumbersome, especially when there are many nested levels.

>>> d["a"]["b"]["c"]
0

So the first constraint is that the keys of the items to get must be provided as tuples, for example:

key = ("a", "b", "c")

The objective now is to create some function that works as follows:

>>> getitem(d, key)
0

This format can also conveniently be applied directly as the __getitem__ method of a class.

One more constraint: I want the function to fail noisily when it is asked to get a non-existing key.

>>> getitem(d, ("asd",))
...
KeyError: 'asd'

This excludes all solutions that use item getting to vivify the dictionary.

Finally, please provide low-level code if possible. If you know of a package that solves this problem please explain the underlying mechanism.

References

CodePudding user response:

I will propose 5 different solutions to get items in a nested dictionary that meet the criteria. Then, I will compare them based on the performance and readability. Conclusions at the end.

Possible solutions

  1. Use a for loop:
def getitem_for(d, key):
    for level in key:
        d = d[level]
    return d
  1. Use while
def getitem_while(d, key):
    while key:
        d = d[key[0]]
        key = key[1:]
    return d
  1. Use reduce
from functools import reduce
from operator import getitem

def getitem_reduce(d, key):
    return reduce(getitem, key, d)
  1. Use recursion
def getitem_recursive(d, key):
    if len(key) !=  1:
        return getitem_recursive(d[key[0]], key[1:])
    else:
        return d[key[0]]
  1. Finally, we can flatten the dictionary so that its keys are tuples, where each element represents a certain level. To flatten the dictionary:
def flatten(ndict):
    def key_value_pairs(d, key=[]):
        if not isinstance(d, dict):
            yield tuple(key), d
        else:
            for level, d_sub in d.items():
                key.append(level)
                yield from key_value_pairs(d_sub, key)
                key.pop()
    return dict(key_value_pairs(ndict))
>>> fd = flatten(d)
>>> fd
{('a', 'b', 'c'): 0, ('b', 'c'): 1, ('c',): 2}

Getting items is now trivial

>>> fd["a", "b", "c"]
0

Discussion

In terms of readability I find 1, 2, and 3 almost equivalent. Maybe reduce is not as well known as for and while loops, but still results in an elegant and concise one-liner. The recursive solutions 4 and 5 may be more difficult to understand, especially for beginners.

Now performance, here you have the simple speed tests that I run in a Jupyter notebook on Python 3.8.

%%timeit
getitem_for(d, key)
346 ns ± 17.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
getitem_while(d, key)
817 ns ± 67.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
getitem_reduce(d, key)
445 ns ± 11.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
getitem_recursive(d, key)
1.06 µs ± 69.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
df[key]
112 ns ± 3.95 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

The best approach seems to be the flattened dictionary; however, here it is how long it takes to create it from the original one:

%%timeit
flatten(d)
7.96 µs ± 779 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The recursive function and while loop are definitely to exclude. The for loop and reduce versions are comparable, even though the for loop is faster.

Conclusions

The performance tests that I run are not precise, do not necessarily apply to all nested dictionaries and Python versions. However, they help identify the for loop and reduce versions as good candidates to efficiently get the items of a nested dictionary. All solutions investigated fail noisily when trying to get a key does not exist.

Flat dictionaries are by far superior to all other options, but one must take into account the cost of flattening. This shows that you should prefer flat dictionaries over nested whenever you have control over the source of data.

CodePudding user response:

You could use python-benedict (I developed it), it's dict wrapper with many reusable features, including keypath support.

The library code is open-source and available on GitHub: https://github.com/fabiocaccamo/python-benedict

Installation:

pip install python-benedict

Usage:

from benedict import benedict

d = {"a": {"b": {"c" : 0}},
     "b": {"c" : 1},
     "c": 2}

key = ["a", "b", "c"]

b = benedict(d)
print(b[key)) # -> 0
  • Related