Home > Back-end >  Which collection should I use it to read elements from multiple threads and full overwrite collectio
Which collection should I use it to read elements from multiple threads and full overwrite collectio

Time:06-03

I'm going to use a static collection that will be used for reading by the core process and fully updated every X mins by the background service.

The background process will load updated data from the database every X mins and set the received dataset into this static collection.

The core process will receive many tasks to check if some values exist in this collection. Each task will be processed in a separate thread. There will be a lot of requests, it should be extremely fast, so I can't ask database for each request and I need an updateable list in memory.

public class LoadedData
{
    public static HashSet<string> Keys { get; set; }
}

public class CoreProcess
{
    public bool ElementExists(string key)
    {
        return LoadedData.Keys.Contains(key);
    }
}

public class BackgroundProcess
{
    public async Task LoadData()
    {
        while (true)
        {
            LoadedData.Keys = GetKeysFromDb();
            await Task.Delay(TimeSpan.FromMinutes(5));
        }
    }
}

So, I'm looking for the best solution for this. I was thinking about using HashSet<T> because I'm sure that each element in the collection will be unique. But HashSet<T> is not thread-safe. So I started considering BlockingCollection<T>, ConcurrentBag<T>, ConcurrentDictionary<T, byte>, but then I wondered if I needed a thread-safe collection here at all. Looks like not, because I'm not going to add/update/remove particular elements in the collection. Only full rewrite from the database.

  1. So, does it mean that I can just use simple HashSet<T>?

  2. Which collection would you use to solve it?

  3. And in general, will there be any issues with a simultaneous reading by the core process and full overwriting of the collection by the background process?

CodePudding user response:

So the HashSet<string> becomes effectively immutable as soon as it becomes the value of the LoadedData.Keys property. In this case your code is almost OK. The only missing ingredient is to ensure the visibility of this property by all threads involved.

In theory it is possible that the compiler or the jitter might use a cached/stale value of the property, instead of looking what is currently stored in the main memory. In practice you might never experience this phenomenon, but if you want to play by the rules you must read and write to this property with volatile semantics. If the Keys was a field, you could just decorate it with the volatile keyword. Since it's a property, you must do a bit more work:

public class LoadedData
{
    private volatile static HashSet<string> _keys;

    public static HashSet<string> Keys
    {
        get => _keys;
        set => _keys = value;
    }
}

...or using the Volatile class instead of the volatile keyword:

public class LoadedData
{
    private static HashSet<string> _keys;

    public static HashSet<string> Keys
    {
        get => Volatile.Read(ref _keys);
        set => Volatile.Write(ref _keys, value);
    }
}

A final cautionary note: The immutability of the HashSet<string> is not enforced by the compiler. It's just a verbal contract that you make with your future self, and with any other future maintainers of your code. In case some mutative code find its way to your code-base, the behavior of your program will become officially undefined. If you want to guard yourself against this scenario, the most semantically correct way to do it is to replace the HashSet<string> with an ImmutableHashSet<string>. The immutable collections are significantly slower than their mutable counterparts (typically at least 10x slower), so it's a trade-off. You can have peace of mind, or ultimate performance, but not both.

CodePudding user response:

So, does it mean that I can just use simple HashSet?
Yes, as long as you are just replacing the hashset this code will work fine.

Which collection would you use to solve it?
It looks fine the way you've got it. HashSet is a good collection to enforce uniqueness, which it looks like you are after

And in general, will there be any issues with a simultaneous reading by the core process and full overwriting of the collection by the background process?
No

  • Related