Home > OS >  Get array of elements from list that sum to value, using Parallel.ForEach (multiple threads)
Get array of elements from list that sum to value, using Parallel.ForEach (multiple threads)

Time:04-21

I'm using the below code to get an array of elements from list that sum to value. However, it only using 1 CPU. My PC has 64 cores so I want to use 100% CPU to speed up the process, so could you please help me update the code to use Parallel.ForEach (multiple threads)?

using System;
using System.Collections.Generic;
using System.Linq;

IEnumerable<List<int>> subset_sum(IEnumerable<int> numbers, int target,
    IEnumerable<int> partial = null, int partial_sum = 0)
{
    partial ??= Enumerable.Empty<int>();
    if (partial_sum == target) yield return partial.ToList();
    if (partial_sum >= target) yield break;
    foreach (var (n, i) in numbers.Select((n, i) => (n, i)))
    {
        var remaining = numbers.Skip(i   1);
        foreach (var subset in subset_sum(
            remaining, target, partial.Append(n), partial_sum   n))
                yield return subset;
    }
}

var numbers = new List<int> { 3, 9, 8, 4, 5, 7, 10 };
var target = 15;
foreach (var subset in subset_sum(numbers, target))
    Console.WriteLine($"[{string.Join(", ", subset)}] = {subset.Sum()}");

CodePudding user response:

Parallelizing recursive algorithms is not easy. It's not obvious how to partition the workload, when the function that does the work can create more work based on arbitrary criteria. It's even more difficult when the workload is not balanced, like in this case. Some recursions are going way deeper than others.

Below is an attempt to parallelize this algorithm, utilizing a BlockingCollection<T> as an intermediate queue. This queue is enumerated by a PLINQ query (AsParallel), and during the enumeration more work can be added in the queue. Determining when it's the right moment for marking the queue as adding-completed (CompleteAdding) is tricky. The implementation below uses a pending counter for this purpose, which is updated with Interlocked operations.

IEnumerable<IList<int>> FindSubsets(IEnumerable<int> numbers, int target)
{
    if (target == 0) yield return Array.Empty<int>();
    if (target < 0) yield break;

    BlockingCollection<(IEnumerable<int> Numbers, IEnumerable<int> Partial,
        int PartialSum, int Depth)> queue = new();
    int pending = 1;
    queue.Add((numbers, Enumerable.Empty<int>(), 0, 0));

    var results = Partitioner
        .Create(queue.GetConsumingEnumerable(),
            EnumerablePartitionerOptions.NoBuffering)
        .AsParallel()
        .WithDegreeOfParallelism(Environment.ProcessorCount)
        .WithMergeOptions(ParallelMergeOptions.NotBuffered)
        .SelectMany(e => FindSubsets_Internal(
            e.Numbers, e.Partial, e.PartialSum, e.Depth, true));

    foreach (var result in results) yield return result;

    IEnumerable<int[]> FindSubsets_Internal(IEnumerable<int> numbers,
        IEnumerable<int> partial, int partialSum, int depth, bool wasScheduled)
    {
        try
        {
            if (partialSum == target) yield return partial.ToArray();
            if (partialSum >= target) yield break;
            foreach (var (n, i) in numbers.Select((n, i) => (n, i)))
            {
                var remaining = numbers.Skip(i   1);
                if (wasScheduled && depth   i <= 5)
                {
                    Interlocked.Increment(ref pending);
                    queue.Add((remaining, partial.Append(n),
                        partialSum   n, depth   1));
                }
                else
                {
                    var inner = FindSubsets_Internal(remaining, partial.Append(n),
                        partialSum   n, depth   1, false);
                    foreach (var subset in inner) yield return subset;
                }
            }
        }
        finally
        {
            if (wasScheduled && Interlocked.Decrement(ref pending) == 0)
                queue.CompleteAdding();
        }
    }
}

In my quad core PC this implementation is not faster than the single-thread implementation. Maybe the condition if (wasScheduled && depth i <= 5) is not balancing the workload correctly. Or it might be fundamentally wrong for some other reason. I am not sure really.

CodePudding user response:

This post is not an answer on how to parallelize the subset-finding algorithm. It just shows that a faster single-thread implementation is possible. The implementation below is not recursive. It uses a Stack<int> as a mechanism for finding all the combinations of the source sequence (the pending stack). It is about 7-8 times faster than the recursive implementation in my PC.

IEnumerable<IList<int>> FindSubsets(IEnumerable<int> source, int target)
{
    int[] sourceArray = source.ToArray();
    if (target == 0) yield return Array.Empty<int>();

    Stack<int> subset = new(sourceArray.Length); // Indices
    int subsetSum = 0;
    Stack<int> pending = new(sourceArray.Length); // Indices
    pending.Push(0);

    while (pending.TryPop(out var index))
    {
        while (subset.Count > pending.Count)
            subsetSum -= sourceArray[subset.Pop()];
        for (; index < sourceArray.Length; index  )
        {
            subset.Push(index);
            subsetSum  = sourceArray[index];
            if (subsetSum == target)
                yield return subset.Select(i => sourceArray[i]).ToArray();
            if (subsetSum > target) break;
            if (index   1 < sourceArray.Length) pending.Push(index   1);
        }
    }
}
  • Related