I am new to parallel programming in C#. I would really appreciate help on how to make the following serial for loop snippet parallel.
I have looked at the other questions on this topic but they either update simple types or arrays. Also, the questions I found do not have a conditional update on the object outside the for a loop.
Note specifically, contrary to answers to similar questions, I am looking to find a certain OBJECT (not simply a double or int) from the list, based on a property that also needs to be calculated/updated inside the loop, and then, based on that property ('fitness') I need to decide whether or not to update an object outside the loop.
This is for a genetic algorithm:
fitnessSum = 0;
DNA<T> best = Population[0];
//loop over the population
for (int i = 0; i < Population.Count; i )
{
//Get the fitness of the current candidate
double fitnessThisPopulation = Population[i].CalculateFitness(i);
fitnessSum = fitnessThisPopulation;
// If the current candidate has a better fitness, then update best fitness
if (Population[i].fitness > best.fitness)
{
best = Population[i];
}
}
BestFitness = best.fitness;
Any help is much appreciated. As an ex FORTRAN person, I am not an expert in Linq so I would prefer to code more along with the traditional for loop style. Many thanks!
CodePudding user response:
If CalculateFitness
is complex and the number of iterations large, then any partitioning strategy is likely to give you a performance gain. However, if its trivial its going to be harder to get a significant gain as the cost of spinning up tasks and threads are going to outweigh the actual jit optimized iteration of the array.
So this answer isn't about what's faster, who knows, it depends on your system, cores, cpu, data, framework, calculations, and what day of the week it is.... however it shows you how you can test this your self with BenchmarkDotNet. I have included a PLinq (parallel partitioned) calculation for comparison
Given
public class DNA
{
public double fitness;
public double CalculateFitness(int i)
{
// some weird complex sum
var result = fitness;
for (int j = 0; j < 1000; j )
result = result-i;
return result;
}
}
Benchmark Code
[SimpleJob(RuntimeMoniker.Net50)]
[MemoryDiagnoser]
public class Test
{
private DNA[] _population;
[Params(100000, 1000000)] public int N;
[GlobalSetup]
public void Setup()
{
var r = new Random(42);
_population = Enumerable
.Range(0, N)
.Select(x => new DNA
{
fitness = r.Next(0, 10000)
}).ToArray();
}
[Benchmark]
public void New()
{
var fitnessSum = _population
.AsParallel()
.Select((x, i) => x.CalculateFitness(i))
.Sum();
var best = _population
.AsParallel()
.Max(x => x.fitness);
}
[Benchmark]
public void New2()
{
var fitnessSum = _population
.AsParallel()
.Select((x, i) => x.CalculateFitness(i))
.Sum();
var best = _population
.Max(x => x.fitness);
}
[Benchmark]
public void New3()
{
var best = _population[0].fitness;
var fitnessSum = _population
.AsParallel()
.Select((x, i) =>
{
AssignIfNewValueGreater(ref best, _population[i].fitness);
return x.CalculateFitness(i);
})
.Sum();
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void AssignIfNewValueGreater(ref double target, double newValue)
{
// be careful, this method is only good for 64 operating systems
double temp;
do
{
temp = target;
} while (newValue > temp && Interlocked.CompareExchange(ref target, newValue, temp) != temp);
}
[Benchmark]
public void Old()
{
double fitnessSum = 0;
var best = _population[0];
//loop over the population
for (var i = 0; i < _population.Length; i )
{
//Get the fitness of the current candidate
var fitnessThisPopulation = _population[i].CalculateFitness(i);
fitnessSum = fitnessThisPopulation;
//If the current candidate has a better fitness, then update best fitness
if (_population[i].fitness > best.fitness) best = _population[i];
}
var bestFitness = best.fitness;
}
}
Usage
BenchmarkRunner.Run<Test>();
Results
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1288 (21H1/May2021Update)
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=6.0.100-rc.1.21463.6
[Host] : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT [AttachedDebugger]
.NET 5.0 : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT
Job=.NET 5.0 Runtime=.NET 5.0
Method | N | Mean | Error | StdDev | Allocated |
---|---|---|---|---|---|
New | 100000 | 8.082 ms | 0.1616 ms | 0.3582 ms | 27,129 B |
New2 | 100000 | 8.205 ms | 0.1610 ms | 0.2458 ms | 13,008 B |
New3 | 100000 | 7.725 ms | 0.1510 ms | 0.2439 ms | 13,072 B |
Old | 100000 | 148.285 ms | 0.0301 ms | 0.0281 ms | - |
New | 1000000 | 75.982 ms | 1.5130 ms | 2.4432 ms | 27,160 B |
New2 | 1000000 | 76.613 ms | 1.5147 ms | 2.9900 ms | 13,008 B |
New3 | 1000000 | 71.590 ms | 1.4137 ms | 3.1619 ms | 13,072 B |
Old | 1000000 | 1,483.186 ms | 0.5073 ms | 0.4236 ms | 704 B |
Disclaimer : Obviously there are oodles of other parallel strategies you could try (linq based or otherwise) that would be worthy to benchmark, this was just an example
Supplemental : before throwing more cores and threads at a problem, first you need to be sure you have a performance problem, and if you do make sure you are solving that problem as efficiently as possible first. There is no point using more threads to make an inefficient algorithm faster