Home > Software engineering >  Find all combinations that add up to given number python with list of lists
Find all combinations that add up to given number python with list of lists

Time:11-23

I've seen plenty of threads on how to find all combinations that add up to a number with one list, but wanted to know how to expand this such that you can only pick one number at a time, from a list of lists

Question:
You must select 1 number from each list, how do you find all combinations that sum to N?

Given:
3 lists of differing fixed lengths [e.g. l1 will always have 6 values, l2 will always have 10 values, etc]:

l1 = [0.013,0.014,0.015,0.016,0.017,0.018]
l2 = [0.0396,0.0408,0.042,0.0432,0.0444,0.045,0.0468,0.048,0.0492,0.0504]
l3 = [0.0396,0.0408]

Desired Output:
If N = .0954 then the output is [0.015, 0.396, 0.408],[0.015, 0.408, 0.0396].

What I have tried:

output = sum(list(product(l1,l2,l3,l4,l5,l6,l7,l8)))

However this is too intensive as my largest bucket has 34 values, creating too many combinations.

Any help/tips on how to approach this in a more efficient manner would be greatly appreciated!

CodePudding user response:

And here is a straightforward dynamic programming solution. I build a data structure which has the answer, and then generate the answer from that data structure.

from dataclasses import dataclass
from decimal import Decimal
from typing import Any

@dataclass
class SummationNode:
    value: Decimal
    solution_tail: Any = None
    next_solution: Any = None

    def solutions (self):
        if self.value is None:
            yield []
        else:
            for rest in self.solution_tail.solutions():
                rest.append(self.value)
                yield rest

        if self.next_solution is not None:
            yield from self.next_solution.solutions()


def all_combinations(target, *lists):
    solution_by_total = {
        Decimal(0): SummationNode(None)
    }

    for l in lists:
        old_solution_by_total = solution_by_total
        solution_by_total = {}
        for x_raw in l:
            x = Decimal(str(x_raw)) # Deal with rounding.
            for prev_total, prev_solution in old_solution_by_total.items():
                next_solution = solution_by_total.get(x   prev_total)
                solution_by_total[x   prev_total] = SummationNode(
                    x, prev_solution, next_solution
                    )
    return solution_by_total.get(Decimal(str(target)))

l1 = [0.013,0.014,0.015,0.016,0.017,0.018]
l2 = [0.0396,0.0408,0.042,0.0432,0.0444,0.045,0.0468,0.048,0.0492,0.0504]
l3 = [0.0396,0.0408]
for answer in all_combinations(0.0964, l1, l2, l3).solutions():
    print(answer)

CodePudding user response:

My solution

So my attempt with Branch&Bound


def bb(target):
    L=[l1,l2,l3,l4,l5,l6,l7,l8]
    mn=[min(l) for l in L]
    mx=[max(l) for l in L]
    return bbrec([], target, L, mn, mx)
    
eps=1e-9

def bbrec(sofar, target, L, mn, mx):
    if len(L)==0:
        if target<eps and target>-eps: return [sofar]
        return []
    if sum(mn)>target eps: return []
    if sum(mx)<target-eps: return []
    res=[]
    for x in L[0]:
        res  = bbrec(sofar [x], target-x, L[1:], mn[1:], mx[1:])
    return res

Note that it is clearly not optimized. For example, it might be faster, to avoid list appending, to deal with 8 elements list from the start (for example, for sofar, filled with None slots at the beginning). Or to create an iterator (yielding results when we find some, rather than appending them.

But as is, it is already 40 times faster than brute force method on my generated data (giving the exact same result). Which is something, considering that this is pure python, when brute force can use by beloved itertools (that is python also, of course, but iterations are done faster, since they are done in implementation of itertools, not in python code).

And I must confess brute force was faster than expected. But, yet, still 40 times too slow.

Explanation

General principle of branch and bound is to enumerate all possible solution recursively (reasoning being "there are len(l1) sort of solutions: those containing l1[0], those containing l1[1], ...; and among the first category, there are len(l2) sort of solutions, ..."). Which, so far, is just another implementation of brute force. Except that during recursion, you can't cut whole branches, (whole subset of all candidates) if you know that finding a solution is impossible from where you are.

It is probably clearer with an example, so let's use yours.

bbrec is called with

  • a partial solution (starting with an empty list [], and ending with a list of 8 numbers)
  • a target for the sum of remaining numbers
  • a list of list from which we must take numbers (so at the beginning, your 8 lists. Once we have chosen the 1st number, the 7 remaining lists. Etc)
  • a list of minimum values of those lists (8 numbers at first, being the 8 minimum values)
  • a list of maximum values

It is called at first with ([], target, [l1,...,l8], [min(l1),...,min(l8)], [max(l1),...,max(l8)])

And each call is supposed to choose a number from the first list, and call bbrec recursively to choose the remaining numbers.

The eigth recursive call with be done with sofar a list of 8 numbers (a solution, or candidate). target being what we have to find in the rest. And since there is no rest, it should be 0. L, mn, and mx an empty list. So When we see that we are in this situation (that is len(L)=len(mn)=len(mx)=0 or len(sofar)=8 — any of those 4 criteria are equivalents), we just have to check if the remaining target is 0. If so, then sofar is a solution. If not, then sofar is not a solution.

If we are not in this situation. That is, if there are still numbers to choose for sofar. bbrec just choose the first number, by iterating all possibilites from the first list. And, for each of those, call itself recursively to choose remaining numbers.

But before doing so (and those are the 2 lines that make B&B useful. Otherwise it is just a recursive implementation of the enumeration of all 8-uples for 8 lists), we check if there is at least a chance to find a solution there.

For example, if you are calling bbrec([1,2,3,4], 12, [[1,2,3],[1,2,3], [5,6,7], [8,9,10]], [1,1,5,8], [3,3,7,10]) (note that mn and mx are redundant information. They are just min and max of the lists. But no need to compute those min and max over and over again)

So, if you are calling bbrec like this, that means that you have already chosen 4 numbers, from the 4 first lists. And you need to choose 4 other numbers, from the 4 remaining list that are passed as the 3rd argument.

And the total of the 4 numbers you still have to choose must be 12.

But, you also know that any combination of 4 numbers from the 4 remaining list will sum to a total between 1 1 5 8=15 and 3 3 7 10=23.

So, no need to even bother enumerating all the solutions starting with [1,2,3,4] and continuing with 4 numbers chosen from [1,2,3],[1,2,3], [5,6,7], [8,9,10]. It is a lost cause: none of the remaining 4 numbers with result in a total of 12 anyway (they all will have a total of at least 15).

And that is what explain why this algorithm can beat, with a factor 40, an itertools based solution, by using only naive manipulation of lists, and for loops.

Brute force solution

If you want to compare yourself on your example, the brute force solution (already given in comments)

def brute(target):
    return [k for k in itertools.product(l1,l2,l3,l4,l5,l6,l7,l8) if math.isclose(sum(k), target)]

Generator version

Not really faster. But at least, if the idea is not to build a list of all solutions, but to iterate through them, that version allows to do so (and it is very slightly faster). And since we talked about generator vs lists in comments...

eps=1e-9
def bb(target):
    L=[l1,l2,l3,l4,l5,l6,l7,l8]
    mn=[min(l) for l in L]
    mx=[max(l) for l in L]
    return list(bbit([], target, L, mn, mx))
def bbit(sofar, target, L, mn, mx):
    if len(L)==0:
        if target<eps and target>-eps:
            print(sofar)
            yield sofar
        return
    if sum(mn)>target eps: return
    if sum(mx)<target-eps: return
    for x in L[0]:
        yield from bbrec(sofar [x], target-x, L[1:], mn[1:], mx[1:])

Here, I use it just to build a list (so, no advantage from the first version).

But if you wanted to just print solutions, for example, you could

for sol in bbit([], target, L, mn, mx):
   print(sol)

Which would print all solutions, without building any list of solutions.

Example lists

Just for btilly or those who would like to test their method against the same lists I've used, here are the ones I've chosen

l1=list(np.arange(0.013, 0.019, 0.001))
l2=list(np.arange(0.0396, 0.0516, 0.0012))
l3=[0.0396, 0.0498]
l4=list(np.arange(0.02, 0.8, 0.02))
l5=list(np.arange(0.001, 0.020, 0.001))
l6=list(np.arange(0.021, 0.035, 0.001))
l7=list(np.arange(0.058, 0.088, 0.002))
l8=list(np.arange(0.020, 0.040, 0.005))

CodePudding user response:

Non-recursive solution:

from itertools import accumulate, product
from sys import float_info

def test(lists, target):
    # will return a list of 2-tuples, containing sum and elements making it
    convolutions = [(0,())]
    # lower_bounds[i] - what is the least gain we'll get from remaining lists
    lower_bounds = list(accumulate(map(min, lists[::-1])))[::-1][1:]   [0]
    # upper_bounds[i] - what is the max gain we'll get from remaining lists
    upper_bounds = list(accumulate(map(max, lists[::-1])))[::-1][1:]   [0]
    for l, lower_bound, upper_bound in zip(lists, lower_bounds, upper_bounds):
        convolutions = [
            # update sum and extend the list for viable candidates
            (accumulated   new_element, elements   (new_element,))
            for (accumulated, elements), new_element in product(convolutions, l)
            if lower_bound - float_info.epsilon <= target - accumulated - new_element <= upper_bound    float_info.epsilon
        ]

    return convolutions

Output of test(lists, target):

[(0.09540000000000001, (0.015, 0.0396, 0.0408)),
 (0.09540000000000001, (0.015, 0.0408, 0.0396))]

This can be further optimized by sorting lists and slicing them based on upper/lower bound using bisect:

from bisect import bisect_left, bisect_right
# ...

convolutions = [
    (partial_sum   new_element, partial_elements   (new_element,))
    for partial_sum, partial_elements in convolutions
    for new_element in l[bisect_left(l, target-upper_bound-partial_sum-float_info.epsilon):bisect_right(l, target-lower_bound-partial_sum float_info.epsilon)]
]
  • Related