Home > OS >  Migrating from Dictionary to ConcurrentDictionary, what are the common traps that I should be aware
Migrating from Dictionary to ConcurrentDictionary, what are the common traps that I should be aware

Time:01-16

I am looking at migrating from Dictionary to ConcurrentDictionary for a multi thread environment.

Specific to my use case, a kvp would typically be <string, List<T>>

  • What do I need to look out for?
  • How do I implement successfully for thread safety?
  • How do I manage reading key and values in different threads?
  • How do I manage updating key and values in different threads?
  • How do I manage adding/removing key and values in different threads?

CodePudding user response:

What do I need to look out for?

Depends on what you are trying to achieve :)

How do I manage reading key and values in different threads?
How do I manage updating key and values in different threads?
How do I manage adding/removing key and values in different threads?

Those should be handled by the dictionary itself in thread safe manner. With several caveats:

  • Functions accepting factories like ConcurrentDictionary<TKey,TValue>.GetOrAdd(TKey, Func<TKey,TValue>) are not thread safe in terms of factory invocation (i.e. dictionary does not guarantee that the factory would be invoked only one time if multiple threads try to get or add the item, for example). Quote from the docs:

    All these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary<TKey,TValue> class. The only exceptions are the methods that accept a delegate, that is, AddOrUpdate and GetOrAdd. For modifications and write operations to the dictionary, ConcurrentDictionary<TKey,TValue> uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, delegates for these methods are called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, the code executed by these delegates is not subject to the atomicity of the operation.

  • In your particular case value - List<T> is not thread safe itself so while dictionary operations will be thread safe (with the exception from the previous point), mutating operations with value itself - will not, consider using something like ConcurrentBag or switching to IReadOnlyDictionary.

  • Personally I would be cautions working with concurrent dictionary via explicitly implemented interfaces like IDictionary<TKey, TValue> and/or indexer (can lead to race conditions in read-update-write scenarios).

CodePudding user response:

The ConcurrentDictionary<TKey,TValue> collection is surprisingly difficult to master. The pitfalls that are waiting to trap the unwary are numerous and subtle. Here are some of them:

  1. Giving the impression that the ConcurrentDictionary<TKey,TValue> blesses with thread-safery everything it contains. That's not true. If the TValue is a mutable class, and is allowed to be mutated by multiple threads, it can be corrupted just as easily as if it wasn't contained in the dictionary.
  2. Using the ConcurrentDictionary<TKey,TValue> with patterns familiar from the Dictionary<TKey,TValue>. Race conditions can trivially emerge. For example if (dict.Contains(x)) list = dict[x] is wrong. In a multithreaded environment it is entirely possible that the key x will be removed between the dict.Contains(x) and the list = dict[x], resulting in a KeyNotFoundException. The ConcurrentDictionary<TKey,TValue> is equiped with special atomic APIs that should be used instead of the previous chatty check-then-act pattern.
  3. Using the Count == 0 for checking if the dictionary is empty. The Count property is very cheep for a Dictionary<TKey,TValue>, and very expensive for a ConcurrentDictionary<TKey,TValue>. The correct property to use is the IsEmpty.
  4. Assuming that the AddOrUpdate method can be safely used for updating a mutable TValue object. This is not a correct assumption. The "Update" in the name of the method means "update the dictionary, by replacing an existing value with a new value". It doesn't mean "modify an existing value".
  5. Assuming that enumerating a ConcurrentDictionary<TKey,TValue> will yield the entries that were stored in the dictionary at the point in time that the enumeration started. That's not true. The enumerator does not maintain a snapshot of the dictionary. The behavior of the enumerator is not documented precisely. It's not even guaranteed that a single enumeration of a ConcurrentDictionary<TKey,TValue> will yield unique keys. In case you want to do an enumeration with snapshot semantics you must first take a snapshot explicitly with the (expensive) ToArray method, and then enumerate the snapshot. You might even consider switching to an ImmutableDictionary<TKey,TValue>, which is exceptionally good at providing these semantics.
  6. Assuming that calling extension methods on ConcurrentDictionary<TKey,TValue>s interfaces is safe. This is not the case. For example the ToArray method is safe because it's a native method of the class. The ToList is not safe because it is a LINQ extension method on the IEnumerable<KeyValuePair<TKey,TValue>> interface. This method internally first calls the Count property of the ICollection<KeyValuePair<TKey,TValue>> interface, and then the CopyTo of the same interface. In a multithread environment the Count obtained by the first operation might not be compatible with the second operation, resulting in either an ArgumentException, or a list that contains empty elements at the end.

In conclusion, migrating from a Dictionary<TKey,TValue> to a ConcurrentDictionary<TKey,TValue> is not trivial. In many scenarios sticking with the Dictionary<TKey,TValue> and adding synchronization around it might be an easier (and safer) path to thread-safety. IMHO the ConcurrentDictionary<TKey,TValue> should be considered more as a performance-optimization over a synchronized Dictionary<TKey,TValue>, than as the tool of choice when a dictionary is needed in a multithreading scenario.

  • Related