I have the following object that I would like to keep in a container that is sorted on insertion and does not contain duplicates, so I am using a SortedSet
from sortedcontainers import SortedSet, SortedList
class R():
def __hash__(self):
return hash(self.person_id)
def __eq__(self, other):
return self.__class__ == other.__class__ and self.person_id == other.person_id
def __nq__(self, other):
return not (self == other)
def __lt__(self, other):
return other.value < self.value
def __init__(self, person_id, value):
self.person_id = person_id
self.value = value
def __repr__(self):
return "person: %s (%s)" % (self.person_id, self.value)
x = SortedSet()
x.add(R(13, 2))
x.add(R(17, 4))
x.add(R(11, 21))
x.add(R(7, -41))
print(x)
When I run this code I get the following output as expected:
SortedSet([person: 11 (21), person: 17 (4), person: 13 (2), person: 7 (-41)])
However if I added an extra duplicate element i.e. 17:
x.add(R(13, 2))
x.add(R(17, 4))
x.add(R(11, 21))
x.add(R(7, -41))
x.add(R(17, -67))
print(x)
I expect the R object with id 17 named person: 17 (4)
to be moved to the back with value person: 17 (-67)
like:
SortedSet([person: 11 (21), person: 13 (2), person: 7 (-41), person: 17 (-67)])
However nothing changes:
SortedSet([person: 11 (21), person: 17 (4), person: 13 (2), person: 7 (-41)])
How can I achieve the desired output as described using a SortedSet
or any other container that is sorted on insertion and has no duplicates?
CodePudding user response:
DeepSpace's answer covers making this work (if somewhat inefficiently), but I'm going to issue a frame challenge here: This is a bad design.
Sets (the logical construct) are designed to store unique items. If something add
ed to a set is equal to something already in it, there's no reason to replace the old item, because the old item and the new item are equivalent. If your class does not use a definition of equality in which equality implies substitutability (two equal instances could be use interchangably in all relevant ways), then the instances aren't suitable for use in a set
. Even without SortedSet
getting involved, using plain set
, this wouldn't work, because set.add
won't replace an item when you insert an "equal" item; they're both equivalent after all, so why do the extra work?
When you need to have a concept of keys that can map to values, where the values for a given key can be changed later without knowing the original value, you want a mapping (dict
-like), not a set (set
-like).
Kelly Bundy suggests that what you want may already exist in the sortedcollections
package (ValueSortedDict
), so I'd go with that if it works. Since sortedcontainers
contains nothing that allows replacing values and sorting on the values, you'd have to go to a lot of work to add that behavior, roughly the same order of magnitude as implementing it yourself from scratch.
Additional notes on why this doesn't work:
On top of your use case being fundamentally unsuited to sets (the logical concept, not merely set
itself), SortedSet
itself is unusually inappropriate for your class, because implicitly relies on two invariants (only one of which is strictly required by Python, though the other is usually adhered to):
- Required by Python:
__eq__
should be consistent with__hash__
: If two items are equal, they must have the same hash, and, as much as possible, two unequal items should have the same hash (ideally the hash should be based on the same fields__eq__
compares, but it's legal to base it on a subset of those fields) - Required by SortedSet (and often assumed by other things that work with sorted objects):
__eq__
should be consistent with__lt__
(and all other rich comparison operators): Ifa == b
, thena < b
andb < a
should both be false; similarly, ifa < b
orb < a
is true, thena != b
. Most sorting stuff in Python sticks to__lt__
comparisons exclusively to allow inconsistent definitions, but if you stick the same objects in atuple
for comparisons, suddenly the lexicographic ordering rules meanstuple
's own__lt__
implementation relies on both the__lt__
and__eq__
of your class, so in practice you want them consistent anyway.
Your class violates #2; the sorting rules are completely unrelated to the definition of equality. SortedSet
muddles along here, determining uniqueness based on __hash__
__eq__
and ordering with __lt__
, but in certain circumstances (e.g. when removing elements) it relies on __lt__
being consistent with __eq__
. Specifically, after removing from the internal set
(using __hash__
__eq__
_) it then removes from the internal SortedList
, which bisects to find the element to remove, using __lt__
, and confirms it found the right element with an equality check, using __eq__
. Since __eq__
and __lt__
are inconsistent (they'd only match if you were trying to remove an R
with the same person_id
and value
), this never finds the value it's trying to remove, and it raises an exception.
CodePudding user response:
You can subclass SortedSet
, overriding its add
and remove
methods. We need to override remove
because the original implementation uses self._list.remove
which will fail because the two R
objects will not be identified as equal.
class MySortedSet(SortedSet):
def add(self, value):
if value in self:
self.remove(value)
super().add(value)
def remove(self, value):
self._set.remove(value)
for index, e in enumerate(self._list[:]):
if hash(e) == hash(value):
self._list.pop(index)
break
x = MySortedSet()
x.add(R(13, 2))
x.add(R(17, 4))
x.add(R(11, 21))
x.add(R(7, -41))
x.add(R(17, -67))
print(x)
outputs
MySortedSet([person: 11 (21), person: 13 (2), person: 7 (-41), person: 17 (-67)])