I'm implementing my own class, with custom __eq__
. And I'd like to return True
for things that are not "equal" in a mathematical sense, but "match" in a fuzzy way.
An issue with this is, however, that this leads to loss of transitivity in a mathematical sense, i.e. a == b && b ==c
, while a
may not be equal to c
.
Question: is Python dependent on __eq__
being transitive? Will what I'm trying to do break things, or is it possible to do this as long as I'm careful myself not to assume transitivity?
Use case
I want to match telephone numbers with one another, while those may be either formatted internationally, or just for domestic use (without a country code specified). If there's no country code specified, I'd like a number to be equal to a number with one, but if it is specified, it should only be equal to numbers with the same country-code, or without one.
So:
- Of course,
31 6 12345678
should equal31 6 1234567
, and06 12345678
should equal06 12345678
31 6 12345678
should equal06 12345678
(and v.v.)49 6 12345678
should equal06 12345678
(and v.v.)- But
31 6 12345678
should not be equal to49 6 12345678
Edit: I don't have a need for hashing (and so won't implement it), so that at least makes life easier.
CodePudding user response:
There is no MUST but a SHOULD relation for comparisons being consistent with the commonly understood relations. Python expressively does not enforce this and float
is an inbuilt type with different behaviour due to float("nan")
.
Expressions: Value comparisons
[…]
User-defined classes that customize their comparison behavior should follow some consistency rules, if possible:
- […]
- Comparison should be transitive. The following (non-exhaustive) examples illustrate that:
- x > y and y > z implies x > z
- x < y and y <= z implies x < z
Python does not enforce these consistency rules. In fact, the not-a-number values are an example for not following these rules.
Still, keep in mind that exceptions are incredibly rare and subject to being ignored: most people would treat float
as having total order, for example. Using uncommon comparison relations can seriously increase maintenance effort.
Canonical ways to model "fuzzy matching" via operators are as subset, subsequence or containment using unsymmetric operators.
- The
set
andfrozenset
support>
,>=
and so on to indicate that one set encompases all values of another.>>> a, b = {1, 5, 6, 8}, {5, 6} >>> a >= a, a >= b, b >= a (True, True, False)
- The
str
andbytes
supportin
to indicate that subsequences are covered.>>> a, b = " 31 6 12345678", "6 12345678" >>> a in b, b in a (False, True)
- The
range
andipaddress
Networks supportin
to indicate that specific items are covered.>>> IPv4Address('192.0.2.6') in IPv4Network('192.0.2.0/28') True
Notably, while these operators may be transitive they are not symmetric. For example, a >= b and c >= b
does not imply b >= c
and thus not a >= c
or vice versa.
Practically, one could model "number without country code" as the superset of "number with country code" for the same number. This means that 06 12345678 >= 31 6 12345678
and 06 12345678 >= 49 6 12345678
but not vice versa. In order to do a symmetric comparison, one would use a >= b or b >= a
instead of a == b
.
CodePudding user response:
__eq__
method should be transitive; at least it is what dictionaries assume.
class A:
def __init__(self, name):
self.name = name
def __eq__(self, other):
for element in self.values:
if element is other:
return True
return False
def __hash__(self):
return 0
def __repr__(self):
return self.name
x, y, z = A('x'), A('y'), A('z')
x.values = [x,y]
y.values = [x,y,z]
z.values = [y,z]
print(x == y)
--> True
print (y == z)
--> True
print(x == z)
--> False
print({**{x:1},**{y:2, z: 3}})
--> {x: 3}
print({**{x:1},**{z:3, y:2}})
--> {x: 1, z: 2}
{**{x:1},**{y: 2, z:3}}
is the union of two dictionaries. No one expects a dictionary to delete a key after updating it.
print(z in {**{x:1},**{y:2, z: 3}})
--> False
By changing the order in the union you can even get different sized dictionaries:
print(len({**{x:1},**{y:2, z: 3}}))
--> 1
print(len({**{x:1},**{z:3, y:2}}))
--> 2