I came across some an hashcode function that does kind of the following:
class MyClass{
private String string;
//..other data members and methods...
public int hashCode()
{
int result = 17;
if(string != null)
{
result = result*31 string.hashCode;
}
return result;
}
};
I am not fully convinced of the method used to calculate the hashCode, I know use of prime numbers yields a better distribution in general. But in this implementation I am not really convinced that's the case.
For example assuming a standard hash implementation I would miss all the buckets between 0 and 17*31.
Is there maybe some subtlety that I am not seeing?
CodePudding user response:
You're missing the overflow case, which is actively likely in hashCode implementations.
In particular, string.hashCode can be negative.
CodePudding user response:
As in the question Is the hashCode function generated by Eclipse any good? (originally duped against this answer, reopened by request), this hashCode function matches implementations built into Java and recommended by Java co-author Joshua Bloch in Effective Java Item 9. This is similar to the Annotation docs, which prescribe a hash function that is the sum of (member value hash code) xor (127 * member name hash code) for all members. By picking prime numbers to start out with--here, 17 and 31--the hash factors would be necessarily coprime.
As in the Objects.hashCode documentation, the important things are that the hashCode is consistent between runs, consistent with equals
, and distinct if practical.
One major factor about hash code design is that hash codes will wrap around. As in the OpenJDK8 code for HashMap:
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
The table length, necessarily a power of two, becomes a mask for the hashCode: For a hash table of size 64 the hash gets a bit mask of 63, 0b00111111
. Given the prime number "hash smear", these low bits will be well-distributed, no more or less than if the 17 and 31 factors were there for a single-field hash function, but of particular advantage if there were two, three, or fifty fields all being combined into a single hash function. The absolute magnitude of the returned hashCode
doesn't matter, as long as the appropriate low bits of the hash codes are well-distributed.