NPE when retrieving from HashMap even though containsKey() returns true in multithreaded environment-CodePudding

We are trying to store unique object for a particular key. When getMyObject is called in multithreaded environment we are getting null ptr exception at the time of return statement

object SampleClass
{
 
    fun getMyObject(Id : String) : MyObject
    {
        if(!myMap.containsKey(Id))
        {
            synchronized(SampleClass)
            {
                if(!myMap.containsKey(Id))
                {
                    myMap[Id] = MyObject()
                }
            }
        }
        return myMap[Id]!!
    }

    private val myMap = HashMap<String,MyObject>()
}

It seems even though contains method returns true when we try to get the value the value is returned null. I am not sure what is the reason behind it.

CodePudding user response：

This is the kind of trouble you get into if you try to outsmart the memory model. If you look at HashMap's source, you'll find containsKey implemented as:

public boolean containsKey(Object key) {
  return getNode(key) != null;
}

Note that it returns true only if there's a HashMap.Node object corresponding to the given key. Now, this is how get is implemented:

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(key)) == null ? null : e.value;
}

What you're seeing is an instance of the unsafe publication problem. Let's say 2 threads (A & B) call getMyObject for a non-existent key. A is slightly ahead of B, so it gets into the synchronized block before B calls containsKey. Particularly, A calls put before B calls containsKey. The call to put creates a new Node object and puts it into the hash map's internal data structures.

Now, consider the case when B calls containsKey before A exists the synchronized block. B might see the Node object put by A, in such case containsKey returns true. At this point, however, the node is unsafely published, because it is accessed by B concurrently in a non-synchronized manner. There's no guarantee its constructor (the one setting its value field) has been called. Even if it was called, there's no guarantee the value reference (or any references set by the constructor) is published along with the node reference. This means B can see an incomplete node: the node reference but not its value or any of its fields. When B proceeds to get, it reads null as the value of the unsafely published node. Hence the NullPointerException.

Here's an ad-hoc diagram for visualizing this:

         Thread A                                     Thread B
- Enter the synchronized block
  - Call hashMap.put(...)
    - Insert a new Node
                                           - See the newly inserted (but not yet 
                                             initialized from the perspective of B)
                                             Node in HashMap.containsKey
                                           - Return node.value (still null)
                                             from HashMap.get
                                           - !! throws a `NullPointerException`
    ...
- Exit the synchronized block
  (now the node is safely published)

The above is just one scenario where things can go wrong (see comments). To avoid such hazards, either use a ConcurrentHashMap (e.g. map.computeIfAbsent(key, key -> new MyObject())) or never access your HashMap concurrently outside of a synchronized block.