Do tuples in python always have the property that `hash(x) == hash(y)` if and only if `x == y`?-CodePudding

Suppose that tuppy and guppy are:

both instantiated from the tuple class
the only elements of tuppy and guppy are integers such as 0, 1, 2, etc...

Is it the case that hash(tuppy) == hash(guppy) if and only if tuppy == guppy?

Note that the following all amount to nearly the same thing:

number = hash(t)
number = t.__hash__()
number = tuple.__hash__(t)

Will hash(x) == hash(y) always return True for examples like the following?

x = tuple([0, 1])

y = tuple(range(0, 2))

print("x == ".ljust(20), type(x), repr(x))
print("y == ".ljust(20), type(y), repr(y))
print("hash(x) == hash(y)".ljust(20), hash(x) == hash(y))

I have two concerns:

two different tuples hashing to the same value. For example (1234, 5) and (1, 2345) might, theoretically, both hash to the same value.
hash(copy.copy(x)) might be different than hash(x).
Do tuples have the behavior that hash(x) == hash(y) if and only if x == y, provided that:
- the instances of tuple come directly from tuple and are not instantiated from a sub-class of tuple?
- the elements of the tuple are integers?

The above assumes that we are working with tuples of integers.

What about nested trees of tuples?

import copy

t1a = (1, (2, (3, (4, 5), 6)))
t1b = tuple(eval("[1, (2, (3, (4, 5), 6))]"))

t2a = ((((1, 2), 3), 4), 5, 6)
t2b = copy.deepcopy(t2a)

print("hash(t1a) == hash(t1b)", hash(t1a) == hash(t1b))
print("hash(t1a) == hash(t2a)", hash(t1a) == hash(t2a))
print("hash(t2a) == hash(t2b)", hash(t2a) == hash(t2b))

I know the answer for a smattering of 3 or 4 test-cases, but what about in general?

If you have huge tuples (say, 1 gigabyte each) will the hash values (int) get truncated? I could imagine that the concatenation of x and y has the same hash value as x if x is sufficiently long.

Suppose we had a million different tuples in memory. We might begin to have many different tuples associated with the same hash value.

CodePudding user response：

As usual for well-formed hash functions, a == b implies hash(a) == hash(b), but hash(a) == hash(b) does not necessarily imply that a == b.

CodePudding user response：

Regardless of type, a __hash__() implementation is playing by the rules if x == y implies hash(x) == hash(y). There is no reliable implication in the other direction: from hash(x) == hash(y), nothing follows about how x compares to y.

This is all true even for simple types like ints. There are an unbounded number of possible Python ints, but only a finite number of possible hash codes. Therefore there must exist distinct ints i and j such that hash(i) == hash(j).