Home > Net >  What does "assuming the hash function disperses the elements properly among the buckets" m
What does "assuming the hash function disperses the elements properly among the buckets" m

Time:02-21

Having started learning Java, I came across this statement in the docs of Java 8:

assuming the hash function disperses the elements properly among the buckets.

Does that simply mean that the order you get, after assigning, will be a mess?

CodePudding user response:

A hash map contains a number of "buckets". For best performance, you want the number of entries to be more or less the same in each bucket. The bucket is determined by the hash function; thus you want a hash function that results in more or less the same probability of hitting each bucket. That is, "the hash function disperses the elements properly among the buckets".

At the other extreme: a hash function that always returned, say, the value 3 would work, but map access wouldn't be very efficient, since one bucket would have all the entries.

I don't understand what you mean by the order being a "mess". A hash map is not ordered; the location of an element depends on its hash code.

CodePudding user response:

It means that HashMap maintains an array of buckets under the hood. Hash code produced by a key object determines to which bucket this entry should go.

A situation when multiple keys yield similar hashes and as a consequence are mapped to the same bucket is called a collision.

Entries of the map that are mapped to the same bucket will be structured as a linked list. Starting with Java 8 when a number of collisions grow after a certain threshold the list will be transformed into a tree.

As you probably know the cost of accessing an element under a certain index in the array is O(1). And HashMap provides access to the values by key with amortized time complexity O(1), but only if a number of collisions is neglectable. I.e. hashCode() is implemented in such a way that it allows to spread the keys relatively evenly between the buckets.

In the edge case when the hash function is badly implemented and, let's say, it returns the same hash for every key all the entries end up in the same bucket. The time complexity for methods like get(), containsKey() degrades to O(n) (with Java 7 and earlier) because you have to iterate over the list of all entries in order to find a particular one. And with Java 8 onwards the time complexity will be O(log n) because that is the worse time required to find an element in a red-black tree.

Does that simply mean that the order you get, after assigning, will be a mess?

The order elements in the HashMap is undefined. This class is useful when you need quick access and don't care about the order. If need an ordered map consider LinkedHashMap which tracks the order in which the entries were added to a map by maintaining a linked list or TreeMap which sorts keys ordered accordingly to their natural order or based on the given comparator.

  • Related