I am trying to improve performane of some code which does some shopping function calling number of different vendors. 3rd party vendor call is async and results are processed to generate a result. Strucure of the code is as follows.
public async Task<List<ShopResult>> DoShopping(IEnumerable<Vendor> vendors)
{
var res = vendors.Select(async s => await DoShopAndProcessResultAsync(s));
await Task.WhenAll(res); ....
}
Since DoShopAndProcessResultAsync is both IO bound and CPU bound, and each vendor iteration is independant I think Task.Run can be used to do something like below.
public async Task<List<ShopResult>> DoShopping(IEnumerable<Vendor> vendors)
{
var res = vendors.Select(s => Task.Run(() => DoShopAndProcessResultAsync(s)));
await Task.WhenAll(res); ...
}
Using Task.Run as is having a performance gain and I can see multiple threads are being involved here from the order of execution of the calls. And it is running without any issue locally on my machine. However, it is a tasks of tasks scenario and wondering whether any pitfalls or this is deadlock prone in a high traffic prod environment.
What are your opinions on the approach of using Task.Run to parallelize async calls?
CodePudding user response:
What is alarming with the Task.Run
approach in your question, is that it depletes the ThreadPool
from available worker threads in a non-controlled manner. It doesn't offer any configuration option that would allow you to reduce the parallelism of each individual request, in favor of preserving the scalability of the whole service. That's something that might bite you in the long run.
Ideally you would like to control both the parallelism and the concurrency, and control them independently. For example you might want to limit the maximum concurrency of the I/O-bound work to 10, and the maximum parallelism of the CPU-bound work to 2. Regarding the former you could take a look at this question: How to limit the amount of concurrent async I/O operations?
Regarding the later, you could use a TaskScheduler
with limited concurrency. The ConcurrentExclusiveSchedulerPair
is a handy class for this purpose. Here is an example of how you could rewrite your DoShopping
method in a way that limits the ThreadPool
usage to two threads at maximum (per request), without limiting at all the concurrency of the I/O-bound work:
public async Task<ShopResult[]> DoShopping(IEnumerable<Vendor> vendors)
{
var scheduler = new ConcurrentExclusiveSchedulerPair(
TaskScheduler.Default, maxConcurrencyLevel: 2).ConcurrentScheduler;
var tasks = vendors.Select(s =>
{
return Task.Factory.StartNew(() => DoShopAndProcessResultAsync(s),
default, TaskCreationOptions.DenyChildAttach, scheduler).Unwrap();
});
return await Task.WhenAll(tasks);
}
My personal preference though would be to use instead the new (.NET 6) Parallel.ForEachAsync
API. Apart from making it easy to control independently the concurrency and the parallelism, it also comes with a better behavior in case of exceptions. Instead of launching invariably all the async operations, it stops launching new operations as soon as a previously launched operation has failed. This can make a big difference in the responsiveness of your service, in case for example that all individual async operations are failing with a timeout exception. Which is something quite likely to happen a few times during the lifetime of a deployed service, unless you are calling APIs that never go down for maintenance.