I would like someone to clarify the difference between awaiting many Tasks and manually waiting for their completion in a collection.
I have an XML reader class that has to process several hundred files and while optimizing it I found an interesting difference. I created two functions, both of which calls my
private async Task ParseModelsFromFile(ConcurrentDictionary<TSettings, TResults> datas, string filePath)
function. This basically loads all my models from the XML file on the given filePath
. The only difference is in the calling and awaiting of this ParseModelsFromFile()
function.
The first function: ProcessFiles()
uses the await
keyword, the second: ProcessFilesWithTasks
creates a Task
object for every single filePath
, puts it in a collection of Task
s and waits for all of them to complete.
The second version with the Task collection is much faster. It takes about half the time. I was thinking that these two functions will do the same thing and the only difference is going to be the use of the async
keyword.
I did test this with several amounts of files and the function using await always comes out to be about 2x slower.
private async Task ProcessFiles(ConcurrentDictionary<TSettings, TResults> datas, BlockingCollection<string> filePaths)
{
if (filePaths.Count == 0)
return;
foreach (var filePath in filePaths.GetConsumingEnumerable())
{
await ParseModelsFromFile(datas, filePath);
}
}
private Task ProcessFilesWithTasks(ConcurrentDictionary<TSettings, TResults> datas, BlockingCollection<string> filePaths)
{
if (filePaths.Count == 0)
return Task.CompletedTask;
List<Task> runningTasks = new List<Task>();
foreach (var filePath in filePaths.GetConsumingEnumerable())
{
runningTasks.Add(ParseModelsFromFile(datas, filePath));
}
Task allTasks = Task.WhenAll(runningTasks.ToArray());
allTasks.Wait();
return Task.CompletedTask;
}
CodePudding user response:
If you understand how asynchronous code works, you'll understand why. So let's first look at what ParseModelsFromFile
does. When it is called, it runs until the first await
inside that method, which is probably where it reads the file. During the time that the OS and hardware goes out to read the file, there is nothing for your application to do. It is at this point that ParseModelsFromFile
returns. It returns a Task
that you can use to know when the rest of the method completes.
Knowing that, we can talk about the differences between your two implementations:
Using await
means: don't do anything until this task completes. The thread is freed up to do other things, which is the benefit of asynchronous programming. But execution of your method stops until the task is complete. So using await
inside the foreach
loop will process one file and wait until it is completely processed before moving to the next iteration of the loop.
Whereas in ProcessFilesWithTasks
, because you're not using await
when you call ParseModelsFromFile
, you are using that time that the OS is retrieving the file to move to the next iteration of the loop and start running ParseModelsFromFile
for the next file.
If this is in an application where there is no synchronization context (like ASP.NET Core, console, or Windows service for example), then the continuation of each call to ParseModelsFromFile
(everything after the first await
) will run on a different thread, which will also make each file process faster.
The only change I would make is to make ProcessFilesWithTasks
async
and use await
on Task.WhenAll()
instead of .Wait()
:
private async Task ProcessFilesWithTasks(ConcurrentDictionary<TSettings, TResults> datas, BlockingCollection<string> filePaths)
{
if (filePaths.Count == 0)
return Task.CompletedTask;
List<Task> runningTasks = new List<Task>();
foreach (var filePath in filePaths.GetConsumingEnumerable())
{
runningTasks.Add(ParseModelsFromFile(datas, filePath));
}
await Task.WhenAll(runningTasks);
}
Wait()
will block the thread from doing anything else. It will just sit idle. Using await
will free up the thread to do other things while it waits.
You also don't need to call .ToArray()
since Task.WhenAll()
accepts IEnumerable<Task>
, which List<Task>
is.
Microsoft has some really well-written articles on Asynchronous programming with async and await that I think you would benefit from reading.