What is a good data-structure for doing pair-key lookup in Python?-CodePudding

By pair-key I mean two keys as a pair mapped to a single value.

F.ex I could use a set of strings as keys, and i could search by iterating, though I’m wondering it there’s a better way to structure the data and find pairs based on either one of or both the keys.

d = {
    {"a", "b"}: "a-b relationship",
    {"c", "d"}: "c-d relationship",
    {"e", "f"}: "e-f relationship"
}

# single key
query = "a"
match = [v for k, v in d.items() if query in k]

# both keys
query = {"a", "b"}
match = [v for k, v in d.items() if query == k]

CodePudding user response：

The simplest version would be just using frozenset as keys:

my_dict = {
    frozenset({"a", "b"}): "a-b relationship",
    frozenset({"c", "d"}): "c-d relationship",
    frozenset({"e", "f"}): "e-f relationship"
}

# both keys
my_dict[frozenset({"a", "b"})]
'a-b relationship'

# single key
[v for k, v in my_dict.items() if "a" in k]
['a-b relationship']

But you may also want to try making your own dict-like class with collections.abc:

import collections.abc as abc

class pair_dict(abc.MutableMapping):

    def __init__(self):
        self._pairs = {}
        self._singles = {}
        super().__init__()

    def __getitem__(self, key):
        if isinstance(key, abc.Iterable) and not isinstance(key, str):
            return self._pairs[frozenset(key)]
        return set(self._singles[key].values())

    def __setitem__(self, key, value):
        if isinstance(key, abc.Iterable) and not isinstance(key, str):
            key = frozenset(key)
            self._pairs[key] = value
            for k in key:
                self._singles.setdefault(k, {})[key] = value
        else:
            raise ValueError('key must be an iterable, not just a string')

    def __delitem__(self, key):
        if isinstance(key, abc.Iterable) and not isinstance(key, str):
            key = frozenset(key)
            del self._pairs[key]
            for k in key:
                del self._singles.setdefault(k, {})[key]
        else:
            raise ValueError('key must be an iterable, not just a string')

    def __iter__(self):
        return iter(self._pairs)

    def __len__(self):
        return len(self._pairs)

You can then use your class for both purposes:

d = pair_dict()
d[('a', 'b')] = 'AB'
d[('b', 'c')] = 'BC'
d[('a', 'c')] = 'AC'

list(d.items())
[(frozenset({'a', 'b'}), 'AB'),
 (frozenset({'b', 'c'}), 'BC'),
 (frozenset({'a', 'c'}), 'AC')]

d[('a', 'b')]
'AB'

d['a']
{'AB', 'AC'}

collections.abc has a list of which methods you have to override for each abstract base class (abc) it provides near the top of the documentation linked above.

You'll also want to add other convenience methods like __repr__ and/or __str__ if you use this as a public-facing class.

Note: I used a separate internal dict and dict of dict for pairs and single references. That makes single lookups much faster, but at the cost of a small extra memory use and making item setting a little slower. Adjust to your use case. =)

CodePudding user response：

Here is a suggestion, writing your addKeyPair function to override the behaviour of d[(a, b)] = v.

def addKeyPair(d, a, b, v):
    d[(a,b)] = v
    d[(b,a)] = v
    d.setdefault(a, []).append(v)
    d.setdefault(b, []).append(v)

d = {}
for a,b in ('ab', 'bc', 'ca'):
    addKeyPair(d, a, b, a b)

print(d)
# {('a', 'b'): 'ab', ('b', 'a'): 'ab', 'a': ['ab', 'ca'],
#  'b': ['ab', 'bc'], ('b', 'c'): 'bc', ('c', 'b'): 'bc',
#  'c': ['bc', 'ca'], ('c', 'a'): 'ca', ('a', 'c'): 'ca'}

print(d[('a', 'b')])
# ab

print(d[('b', 'a')])
# ab

print(d['a'])
# ['ab', 'ca']