Is ProcessorCount is sensible value for maxCount in SemaphoreSlim?-CodePudding

In code of my colleagues I've noticed code like this:

private Task FetchAllKeysFromRedis(List<string> keys, ConcurrentBag<RedisModel> resultsBag, CancellationToken cancellationToken)
{
        var parallelism = Environment.ProcessorCount;
        var semafore = new SemaphoreSlim(initialCount: parallelism, maxCount: parallelism);
        var tasks = new List<Task>();
        foreach (var key in keys)
        {
            cancellationToken.ThrowIfCancellationRequested();
            tasks.Add(FetchFromRedis(key, semafore, resultsBag, cancellationToken));
        }

        return Task.WhenAll(tasks);

List<string> keys approximately contains around 1000 keys, and FetchFromRedis method asynchronously execute I/O operaion (fetch from Redis), so to sum up it executes around 1000 of I/O operations.

Critical section look like this:

private async Task FetchFromRedis(string key, SemaphoreSlim semafore, ConcurrentBag<RedisResult> resultsBag, CancellationToken cancellationToken)
{
     await semafore.WaitAsync(cancellationToken);
     try
     {
         var redisResult = await _getRedisResultFromRedis.ExecuteAsync(key, cancellationToken);
         if (redisResult != null)
              resultsBag.Add(redisResult);
      
         finally
         {
            semafore.Release();
         }
      }

My question: is Environment.ProcessorCount sensible maxCount for SemaphoreSlim? As it will limit count of threads that can step into critical section to very low, like 8, and time of execution will be much longer?

If it not sensible, then what is sensible value for maxCount of threads?

My opinion: since it involves IO, most of the time will just be waiting, no processor time needed. So limiting concurrency to the number of cores make little sense.

CodePudding user response：

I would agree with your reasoning, since the work looks to be network requests that is limited by latency or bandwith, restricting the concurrency to the number of CPU cores make little sense.

You probably need to do some testing to find a suitable maxCount value. But ideally you should have a system that is self-adapting to the actual circumstances. I'm honestly not sure what the best design would be, but I would take a look at dataflow to see if that can provide a better way to limit concurrency. Or at least expose the value as configuration so it can be adjusted at some later time.