Home > database >  Please help me understand this threading issue with a dictionary
Please help me understand this threading issue with a dictionary

Time:12-15

I wrote a simple thread adding values to a dictionary from many other parts of the project:

public void AddValue(int kid, int vid)
{
    if(!dic.ContainsKey(kid)) dic.Add(kid, new List<int>());
    dic[kid].Add(vid);
}

When I ran the code, "sometimes" it'll show that certain key ID does not exist in the dictionary, I figured that's because different threads are "fighting for it".

But then again, "in theory", shouldn't multiple threads fighting for if(!dic.ContainsKey(kid)) dic.Add(kid, new List<int>()); instead since when different threads enter the method without initiating the dictionary should all satisfy the if condition and attempt to Add the key, thus the error should be "The key is already present in the dictionary" instead?

How come one could pass the if check and "still" not initiate the key yet?

This is really throwing me off, somebody please be so kind explaining it to me.

Much appreciated!

PS. I'm aware of the AutoResetEvent and probably could make it works without any error, I just don't understand how and why the if statement could be bypassed and am in need of some explanation, thank you very much for your help.

CodePudding user response:

Here is an example that recreates your problem,

using System;
using System.Collections.Generic;
using System.Linq;
                    
public class Program
{
    public static readonly IDictionary<int, IList<int>> dic = 
        new Dictionary<int, IList<int>>();
    
    public static void Main()
    {
        Enumerable
            .Range(1, 1000000)
            .AsParallel()
            .AsUnordered()
            .ForAll(i => AddValue(i % 100, i));
        
        Console.WriteLine($"Total Count: {dic.Sum(p => p.Value.Count)}");
        Console.WriteLine(
            $"Total Distinct Count: {dic.SelectMany(p => p.Value).Distinct().Count()}");
    }
    
    public static void AddValue(int kid, int vid)
    {
        if(!dic.ContainsKey(kid))
            dic.Add(kid, new List<int>());
        dic[kid].Add(vid);
    }
}

This throws exceptions with messages like,

"The given key 'n' was not present in the dictionary"

This is happening because the implementation of Dictionary<T> is not thread-safe. When Add is called but, before it returns, there is a short window when the result of ContainsKey differs from the result of the index accessor [key].

This is okay in non-concurrent code, its probably a more optimal implementation. The caller won't use the index accessor until after Add has returned.

However, in concurrent code, running on multiple threads, imagine thread A calls Add, and now we are in the short window where the results of ContainsKey and the index accessor [key] differ, then thread B calls ContainsKey and tries to access the value, the Exception is thrown.


If we change the code to,

using System;
using System.Collections.Concurrent;
using System.Linq;
                    
public class Program
{
    public static readonly ConcurrentDictionary<int, ConcurrentBag<int>> dic =
        new ConcurrentDictionary<int, ConcurrentBag<int>>();
    
    public static void Main()
    {
        Enumerable
            .Range(1, 1000000)
            .AsParallel()
            .AsUnordered()
            .ForAll(i => AddValue(i % 100, i));
        
        Console.WriteLine($"Total Count: {dic.Sum(p => p.Value.Count)}");
        Console.WriteLine(
            $"Total Distinct Count: {dic.SelectMany(p => p.Value).Distinct().Count()}");
    }
    
    public static void AddValue(int kid, int vid)
    {
        dic.GetOrAdd(kid, _ => new ConcurrentBag<int>()).Add(vid);
    }
}

Everything works as anticipated.

CodePudding user response:

I believe you use Dictionary<>. It is not thread safe. A Dictionary can support multiple readers concurrently, as long as the collection is not modified (not your case), so you'd to use ConcurrentDictionary instead https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2?view=net-7.0.

CodePudding user response:

Dictionary is not thread-safe to solve this you can use ConsurrentDictionary or lock so that only one thread can perform the insertion operations.

Without lock or ConcurrentDictionary it'll create a race condition between multiple threads which will be difficult to debug.

Example: You can use a lock object based on your requirements, here I'm just using dic as a lock object.

 public void AddValue(int kid, int vid)
 {
        lock (dic)
        {
            if (!dic.ContainsKey(kid)) dic.Add(kid, new List<int>());
            dic[kid].Add(vid);
        }
 }
    

CodePudding user response:

Not sure I understand what actually did happen when you ran your program, but here's a thing that could happen:

thread A                               thread B
Enters AddValue(5,17)                     -
calls dic.ContainsKey(5)               Enters AddValue(5,12)
gets result, false                     calls dic.ContainsKey(5)
calls new List<int>()                  gets result, false
adds empty list A as dic[5]            calls new List<int>()
adds 17 to previously empty list A     adds empty list B as dic[5],
   -                                       overwriting list A
   -                                   adds 12 to previously empty list B 

End result, after both calls have finished, the list, dic[5], contains only one number, 12.


Here's a worse thing that could happen:

thread A                               thread B
Enters AddValue(2,42)                     -
calls dic.ContainsKey(2)                  -
gets result, false                        -
calls new List<int>()                     -
calls dic.add(5,...new list...)        Enters AddValue(7,99)
    in-process of mutating dic         calls dic.ContainsKey(7)
    still in process                       encounters corrupt data structure
    returns                                returns wrong result or maybe
returns                                    crashes the program
   -                                   ...

End results; thread B could wipe out the existing list for kid=7 because dic.ContainsKey(7) gave the wrong answer, or thread B may have attempted to call dic[7].add(99) when dic[7] is null (same reason), or program could have crashed within dic.ContainsKey(7).

  • Related