Algorithm for finding the closest sum across different arrays-CodePudding

I have recently started a game programming job and am trying to write an interesting algorithm in C#. I didn’t study IT so my knowledge of maths and code is quite basic. The problem should be code agnostic but if you want examples I can try to copy some.

Here’s a simplified version of my problem:

I have multiple arrays of elements; some may be the same, some may not. There can be between 1 and 8 arrays. Each element has a weight. The weight can be the same across different elements, however there shouldn’t be twice the same element in the final result.

The goal is to select one element in each array, so that the sum of all their weights is the closest possible to a target value, but never under it.

(I have other complexities to deal with, but I think the core issue is summarised here.)

FOR EXAMPLE (for simplicity’s sake, let’s say the element = its weight):

Data:

Target weight sum = 590
Array A = [100, 300]
Array B = [50, 200, 300, 600]
Array C = [400, 450, 500, 600]

The expected result here would be:

Array Result = [100, 50, 450]
Total weight sum = 600

I hope you can see the amount of edge cases I encounter with this. I’m dealing with a data set of dozens of thousands of entries in up to 8 arrays, so brute force isn’t possible.

I’ve iterated over several versions of the algorithms, which mostly work, except in some edge cases, or the result ends up not being the absolute lowest sum it could be.

We can live with the result not being the most optimal ever to be honest, as long as it’s above the sum requirement, but I’m struck with curiosity and perfectionism on this one… I feel like I’m missing a trick that would make it efficient and easy.

So far my solutions have incorporated:

iterating through all the sorted data from bottom to top until the required sum is reached
similarly from top to bottom
made a turn and decided to first implement a Binary Search to find approximate results, then optimise by trying to get lower or higher weight in each array without breaking the sum requirement
I’ve started to look at Dynamic Programming and the Knapsack problem, but I don’t think it applies to my problem well, or I don’t see how.

I would like to hear other opinions, solutions I haven’t thought of, or knowing if it’s really too hard to get the perfect algorithm for this and I should just get on with it.

Thanks in advance!

CodePudding user response：

How about still the knapsack problem but with a little trick on the capacity?

Let the capacity C be the difference between the sum of all non-repeating weights and the requirement. Implement the usual 0-1 knapsack to find the subset for C. The complement of the subset would be what you need.

Aborted:

You have probably already got it.

Merge the arrays to have an array S of n weights. Sort S such that S[0] <= S[1] ... <= S[n]. Iterate through S until the sum of weights is larger or equal to the requirement. If S[i] is equal to S[i - 1], do not sum and continue. It should be easy to prove that the sum is the closest to the requirement when it terminates. Despite the sorting, the algorithm is only O(n) which should be sufficient for your need.

CodePudding user response：

This is made much trickier by, "there shouldn’t be twice the same element in the final result." So I'll start without immediately considering that condition, then take care of it at the end.

I'm not a C# programmer, so I'll do it in Python instead. You'll have to translate it yourself.

First, let's walk through your example by hand. We'll use dynamic programming to answer the question, "For a particular final sum, what are my possible next choices?" That is we want dictionaries D0, D1, D2 allowing us to do something like this in your example:

D0 -> choose from A -> D1 -> choose from B -> D2 -> choose from C -> answer

We actually will construct them backwards. D2 is trivial to generate from C

D2 = {
    400: [400],
    450: [450],
    500: [500],
    600: [600]
}

We generate D1 from B and D2.

D1 = {
    450: [50],
    500: [50],
    550: [50],
    600: [200],
    650: [50, 200]
    700: [200, 300],
    750: [300],
    800: [200, 300],
    900: [300],
    1000: [600],
    1050: [600],
    1100: [600],
    1200: [600],
}

For example you can get to 650 by (50, 600) or by (200, 450). Therefore the options for a next choice from B are [50, 200].

And finally D0 for the first choice from A:

D0 = {
    550: [100],
    600: [100],
    650: [100],
    700: [100],
    750: [100, 300]
    800: [100, 300],
    850: [100, 300],
    900: [100, 300],
    950: [300],
    1000: [100, 300],
    1050: [300],
    1100: [100, 300],
    1150: [100],
    1200: [100, 300],
    1300: [100, 300],
    1350: [300],
    1400: [300],
    1500: [300]
}

And now that we have that, we can simply do a recursive search.

From D0 we don't want the key 550 because it is not above the target
From D0 we try 600, choose 100
    From D1 for 500 we choose 50.  (Not 100.)
        From D2 for 450 we choose 450.  (Not 100 or 50.)
            FOUND THE ANSWER.

In this recursive search is where we finally put the condition that we can't pick the same weight twice.

OK, how do we do this in code?

def minimal_for_target (target, *arrays):
    # Construct D3 equivalent
    transition = {}
    for value in arrays[-1]:
        transition[value] = [value]

    # We construct this as [D3, D2, D1]
    transitions = [transition]
    for i in reversed(range(len(arrays) - 1)):
        last_transition = transitions[-1]
        transition = {}
        for value in arrays[i]:
            for sum_remaining in last_transition.keys():
                this_sum = value   sum_remaining
                if this_sum in transition:
                    transition[this_sum].append(value)
                else:
                    transition[this_sum] = [value]
        transitions.append(transition)

    transitions = list(reversed(transitions)) # Reorder to D1, D2, D3

    # A helper function for the recursion.  Returns true/false
    # based on chosen.
    def find (new_target, chosen):
        i = len(chosen)
        if i == len(arrays):
            return True # Found a full set of choices
        else:
           for value in transitions[i][new_target]:
               if value not in chosen:
                   chosen.append(value)
                   if find(new_target-value, chosen):
                       return True # Recursively found it
                   chosen.pop() # Remove value from chosen
           return False # Failed to find it.

    # Find the possible sums.
    for total_sum in sorted(transitions[0].keys()):
        if target <= total_sum:
            for value in transitions[0][total_sum]:
                # Try to find it.
                answer = [value]
                if find(total_sum - value, answer):
                    return answer # Filled it in recursively
    return None # No answer could be found.

print(minimal_for_target(590, [100, 300], [50, 200, 300, 600], [400, 450, 500, 600]))

This code will run well in most cases. You'll find the likely values quickly, and the recursive test for no duplicates hopefully runs fast. But it still has some exponential disasters. The following is a good example.

minimal_for_target(
    1000,
    [1000],
    list(range(100)),
    list(range(100)),
    list(range(100)),
    list(range(100)),
    list(range(100)),
    [1000, 2000]
)

It has to run through a lot of possibilities that try to use 1000 twice before it even starts on values that have a chance.

Hopefully this is good enough. There are techniques that can handle even those edge cases, but they are going to be slower and more wasteful of memory on average.