I have data that I want to lookup by key.
My particular use case is that the data (key/value and number of elements) does not change once the map is initialised. All key/value values are known at once.
I have generally use a HashMap for this with default constructor (default initial capacity and load factor).
What is the best way build this Map? If I was to use HashMap, what should the default initial capacity and load factor be set to? Is Map.copyOf() a better solution? Does the size of the map matter (20 elements vs 140,000)?
This article https://docs.oracle.com/en/java/javase/15/core/creating-immutable-lists-sets-and-maps.html#GUID-6A9BAE41-A1AD-4AA1-AF1A-A8FC99A14199 seems to imply that non mutable Map returned by Map.copyOf() is more space efficient.
CodePudding user response:
HashMap
is fairly close to optimal in most cases already. The array of buckets doubles in capacity each time, so it's most wasteful when you have (2^N) 1 items, since the capacity will necessarily be 2^(N 1) (i.e. 2049 items require capacity of 4096, but 2048 items fit perfectly).
In your case, specifying an initial size will only prevent a few reallocations when the map is created, which if it only happens once probably isn't relevant. Load factor is not relevant because the map's capacity will never change. In any case, if you did want to pre-size, this would be correct:
new HashMap<>(numItems, 1);
Does the size of the map matter (20 elements vs 140,000)?
It will have an impact, but not a massive one. Items are grouped into buckets, and buckets are structured as lists or trees. So the performance is mostly dependent on how many items are in a given bucket, rather than the total number of items across all buckets.
What's important is how evenly distributed across your buckets the items are. A bad hash code implementation will result in clustering. Clustering will start to move O(1) operations towards O(log n), I believe.
// The worst possible hashCode impl.
@Override
public int hashCode() { return 0; } // or any other constant
If you have the same items in the map across multiple invocations of your application (not clear from the question if that's the case), and if the class of the key is under your control, then you have the luxury of being able to tweak the hashCode implementation to positively affect the distribution, e.g. by using different prime numbers as a modulus. This would be trial and error, though, and is really only a micro-optimization.
As for the comments/answers addressing how to confer immutability, I'd argue that that's a separate concern. First work out what map is actually optimal, then worry about how to confer immutability upon it, if it isn't already. You can always wrap a mutable map in Collections.unmodifiableMap
. Supposedly Guava's ImmutableMap is slower than HashMap, and I suspect other immutable variants will struggle to exceed the performance of HashMap too.
CodePudding user response:
Have a look at this aticle: https://www.baeldung.com/java-immutable-maps. There are some ready-to-use implementation including that one you mentioned in the question.