I have a method where I need recursion to get a hierarchy of files and folders from an API (graph). When I do my recursion inside of a for loop it works as expected and returns a hierarchy with 665 files. This takes about a minute though because it only fetches one folder at a time, whereas doing it with Task.WhenAll only takes 10 seconds.
When using Task.WhenAll I get inconsistent results though, it will return the hierarchy with anywhere from 661 to 665 files depending on the run, with the exact same code. i'm using the variable totalFileCount as an indication of how many files it has found.
Obviously i'm doing something wrong but I can't quite figure out what. Any help is greatly appreciated!
For loop
for (int i = 0; i < folders.Count; i )
{
var folder = folders[i];
await GetSharePointHierarchy(folder, folderItem, $"{outPath}{folder.Name}\\");
}
Task.WhenAll
var tasks = new List<Task>();
for (int i = 0; i < folders.Count; i )
{
var folder = folders[i];
var task = GetSharePointHierarchy(folder, folderItem, $"{outPath}{folder.Name}\\");
tasks.Add(task);
}
await Task.WhenAll(tasks);
Full method
public async Task<GraphFolderItem> GetSharePointHierarchy(DriveItem currentDrive, GraphFolderItem parentFolderItem, string outPath = "")
{
IEnumerable<DriveItem> children = await graphHandler.GetFolderChildren(sourceSharepointId, currentDrive.Id);
var folders = new List<DriveItem>();
var files = new List<DriveItem>();
var graphFolderItems = new List<GraphFolderItem>();
foreach (var item in children)
{
if (item.Folder != null)
{
System.IO.Directory.CreateDirectory(outPath item.Name);
//Console.WriteLine(outPath item.Name);
folders.Add(item);
}
else
{
totalFileCount ;
files.Add(item);
}
}
var folderItem = new GraphFolderItem
{
SourceFolder = currentDrive,
ItemChildren = files,
FolderChildren = graphFolderItems,
DownloadPath = outPath
};
parentFolderItem.FolderChildren.Add(folderItem);
for (int i = 0; i < folders.Count; i )
{
var folder = folders[i];
await GetSharePointHierarchy(folder, folderItem, $"{outPath}{folder.Name}\\");
}
return parentFolderItem;
}
CodePudding user response:
It is race condition problem. In parallel execution, you should not use normal datatype or variable. You should always use thread safe concept as per your requirement like thread safe datatypes/collection or lock or Monitor or interlocked.
In this case, interlocked.Increment is good approach like replace the below one where using totalFileCount
Interlocked.Increment(ref totalFileCount);
Please refer the below link for good understanding
Thread Safe concept in details or Thread-safety
CodePudding user response:
It seemed like the problem is when you are using the Task.WhenAll way you are making the code flow run in parallel and in the other way with await each time you run the async function, the code flow is actually not run in parallel
and this is exactly your problem, your source code inside the async function access to shared memory object - totalFileCount What causing multi threads access to an object at the same time.
For fixing it and still execute the code in parallel, surround the access to totalFileCount instance with the lock statement which limit the number of concurrent executions of a block of code
lock(lockRefObject)
{
totalFileCount ;
}