Home > front end >  Largest subset from set of numbers (also negative) having sum equal 0
Largest subset from set of numbers (also negative) having sum equal 0

Time:04-28

I have a huge set of numbers with a defined order. The logic in simple terms would look like this:

data['values'] = [1,1,3,4,4,-9,10]

data['order'] = [1,2,3,4,5,6,7]

ExpectedSum = 0

What I wish to return is the original order and values of biggest possible subset of values that we can get with total sum equal 0. For this case one optimal solution would be

solution['values'] = [1,1,3,4,-9]

solution['order'] = [1,2,3,4,6]

The sum could be also achieved by replacing 4th order number with 5th order number, however, one optimal solution is enough. The goal is to reach maximum possible size of subset with total sum =0.

Was looking for variations of Knapsack problem and maximum subarray algorithms but none met my needs.

Any hints or directions appreciated.

CodePudding user response:

All subsets of fixed length can be found with the "n choose k"-way. To find the longest the iteration starts with k=n and decreases by one.

import itertools as it

def combination_finder(l: list) -> tuple:
    for i in range(len(l)):
        for pair in it.combinations(enumerate(l), len(l)-i):
            i, c = zip(*pair)
            if sum(c) == 0:
                return (i, c)
    
     raise Exception('No combination found.')

l = [1,1,3,4,4,-9,10]
id_, comb = combination_finder(l)
print(id_)
print(comb)

Remark: to make the id starting from 1, simply change the enumerate as follow enumerate(l, start=1)

CodePudding user response:

Maybe I'm missing something (it's getting late), but if we denote a subset with k elements as S(k) and we have N elements in total, you could:

  • see if S(N) sums to 0; if so, that's the largest subset
  • if not, see if any of S(N-1) sums to 0; there are N such sets
  • if not, see if any of S(N-2) does; there are N*(N-1) such sets

and so on. This is a "brute force" solution and probably far from optimal, but if the largest subset is expected to be relatively large (close to N in size) it shouldn't be too bad. Note that each step can utilize the sums computed in the preceding step.

Your solution[order] seems to be the indices to the solution subset. It can be done of course, but I'm not sure why you need to get both the values and their indices? It's kind of redundant.

Finally, while doable in pure Python, the NumPy library might be useful for this kind of problem.

  • Related