I am looking at migrating from Dictionary to ConcurrentDictionary for a multi thread environment.
Specific to my use case, a kvp would typically be <string, List<T>>
- What do I need to look out for?
- How do I implement successfully for thread safety?
- How do I manage reading key and values in different threads?
- How do I manage updating key and values in different threads?
- How do I manage adding/removing key and values in different threads?
CodePudding user response:
What do I need to look out for?
Depends on what you are trying to achieve :)
How do I manage reading key and values in different threads?
How do I manage updating key and values in different threads?
How do I manage adding/removing key and values in different threads?
Those should be handled by the dictionary itself in thread safe manner. With several caveats:
Functions accepting factories like
ConcurrentDictionary<TKey,TValue>.GetOrAdd(TKey, Func<TKey,TValue>)
are not thread safe in terms of factory invocation (i.e. dictionary does not guarantee that the factory would be invoked only one time if multiple threads try to get or add the item, for example). Quote from the docs:All these operations are atomic and are thread-safe with regards to all other operations on the
ConcurrentDictionary<TKey,TValue>
class. The only exceptions are the methods that accept a delegate, that is,AddOrUpdate
andGetOrAdd
. For modifications and write operations to the dictionary,ConcurrentDictionary<TKey,TValue>
uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, delegates for these methods are called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, the code executed by these delegates is not subject to the atomicity of the operation.In your particular case value -
List<T>
is not thread safe itself so while dictionary operations will be thread safe (with the exception from the previous point), mutating operations with value itself - will not, consider using something likeConcurrentBag
or switching toIReadOnlyDictionary
.Personally I would be cautions working with concurrent dictionary via explicitly implemented interfaces like
IDictionary<TKey, TValue>
and/or indexer (can lead to race conditions in read-update-write scenarios).
CodePudding user response:
The ConcurrentDictionary<TKey,TValue>
collection is surprisingly difficult to master. The pitfalls that are waiting to trap the unwary are numerous and subtle. Here are some of them:
- Giving the impression that the
ConcurrentDictionary<TKey,TValue>
blesses with thread-safery everything it contains. That's not true. If theTValue
is a mutable class, and is allowed to be mutated by multiple threads, it can be corrupted just as easily as if it wasn't contained in the dictionary. - Using the
ConcurrentDictionary<TKey,TValue>
with patterns familiar from theDictionary<TKey,TValue>
. Race conditions can trivially emerge. For exampleif (dict.Contains(x)) list = dict[x]
is wrong. In a multithreaded environment it is entirely possible that the key x will be removed between thedict.Contains(x)
and thelist = dict[x]
, resulting in aKeyNotFoundException
. TheConcurrentDictionary<TKey,TValue>
is equiped with special atomic APIs that should be used instead of the previous chatty check-then-act pattern. - Using the
Count == 0
for checking if the dictionary is empty. TheCount
property is very cheep for aDictionary<TKey,TValue>
, and very expensive for aConcurrentDictionary<TKey,TValue>
. The correct property to use is theIsEmpty
. - Assuming that the
AddOrUpdate
method can be safely used for updating a mutableTValue
object. This is not a correct assumption. The "Update" in the name of the method means "update the dictionary, by replacing an existing value with a new value". It doesn't mean "modify an existing value". - Assuming that enumerating a
ConcurrentDictionary<TKey,TValue>
will yield the entries that were stored in the dictionary at the point in time that the enumeration started. That's not true. The enumerator does not maintain a snapshot of the dictionary. The behavior of the enumerator is not documented precisely. It's not even guaranteed that a single enumeration of aConcurrentDictionary<TKey,TValue>
will yield unique keys. In case you want to do an enumeration with snapshot semantics you must first take a snapshot explicitly with the (expensive)ToArray
method, and then enumerate the snapshot. You might even consider switching to anImmutableDictionary<TKey,TValue>
, which is exceptionally good at providing these semantics. - Assuming that calling extension methods on
ConcurrentDictionary<TKey,TValue>
s interfaces is safe. This is not the case. For example theToArray
method is safe because it's a native method of the class. TheToList
is not safe because it is a LINQ extension method on theIEnumerable<KeyValuePair<TKey,TValue>>
interface. This method internally first calls theCount
property of theICollection<KeyValuePair<TKey,TValue>>
interface, and then theCopyTo
of the same interface. In a multithread environment theCount
obtained by the first operation might not be compatible with the second operation, resulting in either anArgumentException
, or a list that contains empty elements at the end.
In conclusion, migrating from a Dictionary<TKey,TValue>
to a ConcurrentDictionary<TKey,TValue>
is not trivial. In many scenarios sticking with the Dictionary<TKey,TValue>
and adding synchronization around it might be an easier (and safer) path to thread-safety. IMHO the ConcurrentDictionary<TKey,TValue>
should be considered more as a performance-optimization over a synchronized Dictionary<TKey,TValue>
, than as the tool of choice when a dictionary is needed in a multithreading scenario.