Home > Mobile >  Why is key being deleted by IdentityHashMap
Why is key being deleted by IdentityHashMap

Time:01-29

I know IdentityHashMap is not a thread safe data structure, so the below test failing is not a total surprise. Having said that, my understanding of non-Concurrent and non-synchronized maps being not thread-safe was that there is no happens before relation between the put of the value in the map, and the get of the value, so a thread that receives the value from the cache, doesn’t need to see all fields if the value has publication problems.

However I am unable to understand how a Map can loose values even if it not thread safe. See the below reprex,

import org.junit.jupiter.api.RepeatedTest;

import java.util.IdentityHashMap;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;

import static org.junit.jupiter.api.Assertions.assertNotNull;
import static org.junit.jupiter.api.Assertions.assertTrue;

class IdentityHashmapTest {

    static class Key {
        final String description;

        Key(String description) {
            this.description = description;
        }

        @Override
        public String toString() {
            return "Key{"  
                    "description='"   description   '\''  
                    '}';
        }

        @Override
        public boolean equals(Object o) {
            return this == o;
        }

        @Override
        public int hashCode() {
            return Objects.hash(description);
        }
    }

    @RepeatedTest(100)
    void testConcurrency() throws InterruptedException {
        final Map<Key, Integer> map = new IdentityHashMap<>();
        Key mainKey = new Key("main");
        map.put(mainKey, 1);
        ExecutorService executor = Executors.newFixedThreadPool(4);
        assertNotNull(map.get(mainKey));
        for (int i = 0; i <= 5; i  ) {
            executor.submit(() -> {
                for (int j = 1; j <= 5; j  ) {
                    map.put(new Key("random"), ThreadLocalRandom.current().nextInt());
                }
            });
        }
        executor.shutdown();
        assertTrue(executor.awaitTermination(30, TimeUnit.SECONDS));
        assertNotNull(map.get(mainKey));
    }
}

I insert mainKey even before I am making concurrent updates to the IdentityHashMap, but after all the threads have been shutdown I can see the value has disappeared from the map in some cases. Note not all 100 invocations of the RepeatedTest fail, but on an average 10-20 invocations fail.

I suspect that this might be because of re-sizing, because if I initialize the IdentityHashMap with a large expectedMaxSize all tests pass. Also I see special handling of resize in the put of IdentityHashMap, so that adds to my suspicion.

I want to understand how exactly is the value getting erased from the IdentityHashMap. I already know the fix of the problem (i.e use a thread safe data structure like ConcurrentHashMap, I am interested to understand the exact root cause of this inconsistency.

CodePudding user response:

You have two threads working on the same structure with no coordination. Each thread thinks that it's safe to manipulate the map however it wants and acts accordingly. When two threads are doing that, all bets are off. There is no "expected behaviour", except that you should expect it to break.

A map is not just an infinitely big bag of data you can keep stuffing more data into. It needs reorganizing as it grows. An insert is not just setting an null element in some array to have a non-null value. It may result in a complete reorganization of everything in the map.

If you have 2 threads trying to do this reorganization at the same time, it seems reasonable that even keys that weren't presently being inserted on either of those threads could go missing.

If you set an explicit default size on the map then I think you'll see fewer failures. This is not a solution - the solution is to use a thread-safe implementation - but this will demonstrate that a sparser Map will do fewer reorganizations, which should reduce the frequency of errors (though not eliminate them).

final Map<Key, Integer> map = new IdentityHashMap<>(1_000);

CodePudding user response:

This is a snippet of the code from IdentityHashMap::resize

        Object[] oldTable = table;
        ...
        Object[] newTable = new Object[newLength];
        ...
        for (int j = 0; j < oldLength; j  = 2) {
            Object key = oldTable[j];
            if (key != null) {
                Object value = oldTable[j 1];
>>>>>>>>>>>>>>  oldTable[j] = null;
>>>>>>>>>>>>>>  oldTable[j 1] = null;
                int i = hash(key, newLength);
                while (newTable[i] != null)
                    i = nextKeyIndex(i, newLength);
                newTable[i] = key;
                newTable[i   1] = value;
            }
        }
        table = newTable;

It's setting the values in the old array to be null as it copies to the new array, then assign the new array to be the field.

And you have 5 threads doing this concurrently, so sometimes this happens:

  1. Thread A and B starts resizing at the same time, so they see the same reference to oldTable. Thread A successfully copies "main" from oldTable and nulls out the old value. Thread B is a bit behind therefore can't see "main" anyone when since Thread A has deleted it.
  2. Thread A assigns its newTable to the field which contains "main", then thread B assigns its own newTable to the field, but because of #1 it doesn't contain "main".
  • Related