Say I want to store ordered values where the key values represent a lower bound. Like this example:
d = {1: "pear", 4: "banana", 7: "orange"}
I can access the first object by d[1]
. Say I want to store it so that I can access the first object "pear"
by calling for any value between [1,4)
. If I input any "keyvalue"
between [4,7)
I want "banana"
to be returned. Is there any type of data structure like that in python? I found intervalTrees
, but it looked a bit more advanced than what I was looking for. In intervalTrees
the intervals which are the keys, can be overlapping and I don't want that. Or maybe it is not a dictionary of any type I want since you can mix datatypes as keys in one dictionary. What do you think?
EDIT: From the tip I got, this would be a working code:
import bisect
d = [(1, "pear"), (4, "banana"), (7,"orange") ]
keys = [j[0] for j in d]
for v in range(1,10):
print("Using input ", v)
i = bisect.bisect(keys, v) - 1
out = d[i]
print(out)
print("")
# Or using SortedDict
from sortedcontainers import SortedDict
d2 = SortedDict()
d2[1] = 'pear'
d2[4] = 'banana'
d2[7] = 'orange'
for v in range(1,10):
print("Using input ", v)
i = bisect.bisect(d2.keys(), v) - 1
j = d2.keys()[i]
out = d2[j]
print(out)
print("")
CodePudding user response:
The data structure you're looking for is a binary search tree (BST), and preferably a balanced BST. Your dictionary keys are the keys of the BST, and each node would just have an additional field to store the corresponding value. Then your lookup is just a lower-bound / bisect-left on the keys. Looking up Python implementations for Red-Black trees or AVL trees returns many possible packages.
There is no builtin library for always-sorted data. If you never need to add or delete keys, you can use bisect with (key, value) tuples in a sorted list.
For a pure Python implementation that allows modification, I would recommend checking out SortedDict from the SortedContainers library. It's built to be a drop-in replacement for BST's, is very usable and tested, and claims to outperform pointer-based BST's in memory and speed on reasonably sized datasets (but does not have the same asymptotic guarantees as a BST). You can also provide a custom key for comparing objects of different types.