Home > Software engineering >  .Net collecting the same list in different task
.Net collecting the same list in different task

Time:07-28

I created list of object. And i want to fill the list in different tasks. It looks correct but it doesn't work.

This is my code:

var splittedDataList = Extensions.ListExtensions.SplitList(source, 500);

// Create a list of tasks
var poolTasks = new List<Task>();
var objectList = new List<Car>();

for (int i = 0; i < splittedDataList.Count; i  )
{
   var data = splittedDataList[i];

   poolTasks.Add(Task.Factory.StartNew(() =>
   {
       // Collect list of car
       objectList = CollectCarList(data); 
   }));
}

 // Wait all tasks to finish
Task.WaitAll(poolTasks.ToArray());

public List<Car> CollectCarList(List<Car> list)
{
  ///
  return list;
}

CodePudding user response:

The code is using Tasks as if they were threads to flatten a nested list. Tasks aren't threads, they're a promise that something will produce a result in the future. In JavaScript they're actually called promises.

The question's exact code is flattening a nested list. This can easily be done with Enumerable.SelectMany(), eg :

var cars=source.SelectMany(data=>data).ToList();

Flattening isn't an expensive operation so there shouldn't be any need for parallelism. If there are really that many items, Parallel LINQ can be used with .AsParallel(). LINQ operators after that are executed using parallel algorithms and collected at the end :

var cars=source.AsParallel()
               .SelectMany(data=>data)
               .ToList();

Parallel LINQ is far more useful if it's used to parallelize the real time consuming processing before flattening :

var cars=source.AsParallel()
               .Select(data=>DoSomethingExpensive(data))
               .SelectMany(data=>data)
               .ToList();

Parallel LINQ is built for data parallelism - processing large amounts of in-memory data by partitioning the input and using worker tasks to process each partition with minimal synchronization between workers. It's definitely not meant for executing lots of asynchronous operations concurrently. There are other high-level classes for that

CodePudding user response:

First off List are not thread safe. If you really wanted to fill a list via different async tasks then you would probably want to use some sort of concurrent collection.

https://docs.microsoft.com/en-us/dotnet/api/system.collections.concurrent?view=net-6.0

The second questions is why would you want to do this? In your current example all this work is CPU bound anyway so creating multiple tasks does not really get you anywhere. It's not going to speed anything up, in fact it will do quite the contrary as the async state machine calls will add overhead to the processing.

If your input lists where coming from various other async tasks, e.g calls to a database then this might make more sense. In any case based on what I see above this would do what your asking.

object ListLock = new object();

async void Main()
{
    var splittedDataList = new List<List<int>> { Enumerable.Range(0, 500).ToList(), Enumerable.Range(0, 500).ToList() };
    
    // Create a list of tasks
    var poolTasks = new List<Task>();
    var objectList = new List<int>();
    
    for (int i = 0; i < splittedDataList.Count; i  )
    {
        var data = splittedDataList[i];
    
        poolTasks.Add(Task.Factory.StartNew(() =>
        {
            lock (ListLock)
            {
                // Collect list of car
                objectList.AddRange(CollectCarList(data));
            }
        }));
    }
    
    // Wait all tasks to finish
    Task.WaitAll(poolTasks.ToArray());
    
    objectList.Dump();
}

// You can define other methods, fields, classes and namespaces here
public List<int> CollectCarList(List<int> list)
{
    ///
    return list;
}

I changed the list to be a simple List of int as I didn't what the definition of Car was in your application. The lock is required to overcome the thread safety issue with List. This could be removed if you used some kind of concurrent collection. I just want to reiterate that what this code is doing in it's current state is pointless. You would be better off just doing all this on a single thread unless there is some actual async IO going somewhere else.

  • Related