Home > Software design >  Do tuples in python always have the property that `hash(x) == hash(y)` if and only if `x == y`?
Do tuples in python always have the property that `hash(x) == hash(y)` if and only if `x == y`?

Time:08-10

Suppose that tuppy and guppy are:

  • both instantiated from the tuple class
  • the only elements of tuppy and guppy are integers such as 0, 1, 2, etc...

Is it the case that hash(tuppy) == hash(guppy) if and only if tuppy == guppy?

Note that the following all amount to nearly the same thing:

number = hash(t)
number = t.__hash__()
number = tuple.__hash__(t)

Will hash(x) == hash(y) always return True for examples like the following?

x = tuple([0, 1])

y = tuple(range(0, 2))

print("x == ".ljust(20), type(x), repr(x))
print("y == ".ljust(20), type(y), repr(y))
print("hash(x) == hash(y)".ljust(20), hash(x) == hash(y))

I have two concerns:

  • two different tuples hashing to the same value. For example (1234, 5) and (1, 2345) might, theoretically, both hash to the same value.

  • hash(copy.copy(x)) might be different than hash(x).

  • Do tuples have the behavior that hash(x) == hash(y) if and only if x == y, provided that:

    • the instances of tuple come directly from tuple and are not instantiated from a sub-class of tuple?
    • the elements of the tuple are integers?

The above assumes that we are working with tuples of integers.

What about nested trees of tuples?

import copy

t1a = (1, (2, (3, (4, 5), 6)))
t1b = tuple(eval("[1, (2, (3, (4, 5), 6))]"))

t2a = ((((1, 2), 3), 4), 5, 6)
t2b = copy.deepcopy(t2a)

print("hash(t1a) == hash(t1b)", hash(t1a) == hash(t1b))
print("hash(t1a) == hash(t2a)", hash(t1a) == hash(t2a))
print("hash(t2a) == hash(t2b)", hash(t2a) == hash(t2b))

I know the answer for a smattering of 3 or 4 test-cases, but what about in general?

If you have huge tuples (say, 1 gigabyte each) will the hash values (int) get truncated? I could imagine that the concatenation of x and y has the same hash value as x if x is sufficiently long.

Suppose we had a million different tuples in memory. We might begin to have many different tuples associated with the same hash value.

CodePudding user response:

As usual for well-formed hash functions, a == b implies hash(a) == hash(b), but hash(a) == hash(b) does not necessarily imply that a == b.

CodePudding user response:

Regardless of type, a __hash__() implementation is playing by the rules if x == y implies hash(x) == hash(y). There is no reliable implication in the other direction: from hash(x) == hash(y), nothing follows about how x compares to y.

This is all true even for simple types like ints. There are an unbounded number of possible Python ints, but only a finite number of possible hash codes. Therefore there must exist distinct ints i and j such that hash(i) == hash(j).

  • Related