How to utilize parallelism to "add" two Dictionaries together faster?-CodePudding

I am fairly new to C#. I would like to "add" two Dictionaries together using parallelism.

Example:

Dictionary<string, int> dict1 = new()
{
    { "a",  1 },
    { "b",  2 },
    { "c",  3 },
    //...
};

Dictionary<string, int> dict2 = new()
{
    { "a",  1 },
    { "b",  2 },
    { "c",  3 },
    //...
};

Result of dict1 dict2:

Dictionary<string, int> dict3 = new()
{
    { "a",  2 },
    { "b",  4 },
    { "c",  6 },
    //...
};

I currently have a solution where I wrap a Dictionary<string, int> in my Modifier class:

public class Modifier
{
    public Dictionary<string, int> modifier = new Dictionary<string, int>();

    public void Add(string key, int value){
        modifier.Add(key, value);
    }

    public bool ContainsKey(string key){
        return modifier.ContainsKey(key);
    }

    public int this[string key]{
        get => modifier[key];
        set => modifier[key] = value;
    }

    public static Modifier operator  (Modifier a, Modifier b){
        foreach (string key in b.modifier.Keys){
            if (a.ContainsKey(key)){
                a[key]  = b[key];
            } else {
                a.Add(key, b[key]);
            }
        }
        return a;
    }
}

To add, I do Modifier result = a b (where a and b are Modifier objects). However, for my application, I will be doing many of these "additions" on large dictionaries and am afraid it will not scale well.

I am not very familiar with C# parallel programming. I thought about splitting the keys into equal sub-groups and performing the addition one each group, but I am not quite sure how to implement it, or even if it is optimal.

What is the most efficient way to do this addition?

CodePudding user response：

Parallelism is unlikely to yield impressive performance improvements in this case. Actually it's more likely to make your operators slower than faster, on top of being tricky to implement and easy to get it wrong. My suggestion is to focus at optimizing the single-thread performance of your operators. Here is a suggestion:

public static Modifier operator  (Modifier a, Modifier b)
{
    foreach ((string key, int value) in b.modifier)
    {
        ref int valueRef = ref CollectionsMarshal
            .GetValueRefOrAddDefault(a.modifier, key, out bool exists);
        valueRef = exists ? valueRef   value : value;
    }
    return a;
}

Only one string hashing is performed in each iteration of the foreach loop. This alone will probably give you a performance boost of 100% or more. You can read about the CollectionsMarshal.GetValueRefOrAddDefault method in the docs (it is available from .NET 6 and later). Be careful, it's a low-level method.

You might be able to improve the performance even further by initializing the dictionaries with a specialized IEqualityComparer<string>, provided that you have specific knowledge about the inner patterns of the actual strings that will be used as keys. You might find interesting a related API proposal on GitHub: Provide optimized read-only collections.