Get combinations from values in array for which the sum doesn't exceed a value-CodePudding

I am trying to obtain all the combinations of values in a list, for which the sum of the corresponding values in another list do not exceed a certain limit. As an example, I have certain jobs that I want to produce in a batch, but that batch has a limited capacity, meaning that only certain combinations of jobs can be put together so that their size don't exceed the capacity of the batch.

Imagining I have jobs defined as job 1, job 3, and job 5, I have the following list:

jobs = [1,3,5]

If job 1 has size 1, job 3 has size 3, and job 5 has size 2, I have the following list:

jobSize = [1,3,2]

If my batch has maximum capacity of 5, this means the desired output is the following:

combinations = [(1),(3),(5),(1,3),(1,5),(3,5)] 
#here I use the name/number of the jobs, and (1,3,5) not included cause total size of 5 is exceeded

From what I have seen, something similar (not exactly the same) is possible to do using combinations from itertools, like explained here Powersets in Python using itertools or here How to get all possible combinations of a list’s elements?, but this is definitely not as efficient as I would like.

I also tried something with itertools.product and np.where, like explained here Python Numpy: All combinations from array or scalar by Corralien, but once again, this is not exactly what I need, and requires a lot of post-processing operations, ending up being too slow.

My idea is that this is possible to do somehow with numpy arrays, because it is the only way I can see this being fast. Does anyone have any idea?

CodePudding user response：

You can approach this using a 2D matrix of 0/1 where 1 indicates that the job is selected to run:

from sklearn.utils.extmath import cartesian

jobs = [1,3,5,7,9]
jobSize = [1,3,2,2,5]
target = 5

# produce all combinations of 0/1
# 1 column per job
m = cartesian([[0,1]]*len(jobSize))

# compute sum of job costs
total = (jobSize*m).sum(axis=1)

# get combinations with total cost ≤ target
idx = np.nonzero(total<=target)[0]
out = m[idx]

Output:

# job   1  3  5  7  9
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1],
       [0, 0, 0, 1, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 1, 1, 0],
       [0, 1, 0, 0, 0],
       [0, 1, 0, 1, 0],
       [0, 1, 1, 0, 0],
       [1, 0, 0, 0, 0],
       [1, 0, 0, 1, 0],
       [1, 0, 1, 0, 0],
       [1, 0, 1, 1, 0],
       [1, 1, 0, 0, 0]])

If you want the output as lists, you can further use itertools.compress:

from itertools import compress
out = [list(compress(jobs, l)) for l in m[idx]]

output:

[[], [9], [7], [5], [5, 7], [3], [3, 7], [3, 5], [1], [1, 7], [1, 5], [1, 5, 7], [1, 3]]

CodePudding user response：

I think the easiest approach is to use a recursive function. You take one job away from your list, and run the function on the remaining sublist. Then, you combine the results from the sublist with the first job accordingly.

In this way, if a combination is invalid – for example (1,2,3) when max is 5 – it will not show up in the results from the sublist, and so all longer combinations containing (1,2,3) are eliminated.

This could be improved by:

changing from recursive to iterative
using multiprocessing. For example instead of taking 1 job – taking 3 first jobs, this gives 6 combinations which would need to be combined with the results from the sublist processing. This could be seperated into processes running at the same time
take advantage of the fact that jobs may have the same jobSize. In which case the results are symmetric in regards to these jobs

Code:

import sys
import time
from collections import namedtuple

def print_job(self):
    #return f'( /{self.job_name}/ {self.size} )'
    return str(self.job_name) ' (' str(self.size) ')'
def print_solution(self):
    return ' '.join([str(job) for job in self.jobs]) #  '\n'
Job = namedtuple('Job', ['job_name','size'])
Solution = namedtuple('Solution', ['jobs', 'Size'])
Job.__repr__ = print_job
Solution.__repr__ = print_solution

########### main function #################################
def find_sets(job_list, max_size):
    first_job = job_list[0]
    new_solutions = []
    if len(job_list) > 1:
        other_solutions = find_sets(job_list[1:], max_size)
        for solution in other_solutions:
            if solution.Size   first_job.size <= max_size:
                new_solution = Solution(solution.jobs   [first_job], solution.Size   first_job.size)
                new_solutions.append(new_solution)
    else:
        other_solutions = []
    if first_job.size <= max_size:
            new_solutions.append(Solution([first_job], first_job.size))
    return other_solutions   new_solutions


def find_all_job_sets(jobs='ABC', jobSize = (1,3,2), max_size=5, show_progress=True):
    sys.setrecursionlimit(1500)
    if (len(jobs) != len(jobSize)):
        print('Number of jobs and jobSizes must match.')
        exit(1)
    jobList = list(Job(name, size) for name, size in zip(jobs, jobSize))
    start = time.time()
    results = find_sets(jobList, max_size)
    end = time.time()
    print(results)
    print(f'Number of sets: {len(results)}')
    print(f'Time: {int((end - start) * 100) / 100} seconds')

find_all_job_sets(jobs='ABCDE', jobSize=[1,2,3,4,5], max_size=5)

Output:

[E (5), D (4), C (3), C (3) B (2), B (2), D (4) A (1), C (3) A (1), B (2) 
A (1), A (1)]
Number of sets: 9

List length 100 with max_size=10:

find_all_job_sets(jobs=[f'J{i}' for i in range(100)],
                  jobSize=10*[1,2,3,4,5,6,7,8,9,10], max_size=10)

Output:

...
J0 (1), J50 (1) J11 (2) J10 (1) J2 (3) J1 (2) J0 (1), J40 (1) J11 (2) J10 (1) J2 (3) J1 (2) J0 (1), J30 (1) J11 (2) J10 (1) J2 (3) J1 (2) J0 (1), J20 (1) J11 (2) J10 (1) J2 (3) J1 (2) J0 (1), J11 (2) J10 (1) J2 (3) J1 (2) J0 (1), J10 (1) J2 (3) J1 (2) J0 (1), J3 (4) J2 (3) J1 (2) J0 (1), J2 (3) J1 (2) J0 (1), J1 (2) J0 (1), J0 (1)]
Number of sets: 476465
Time: 1.44 seconds

Long list of 1000 jobs (max_size = 4):

find_all_job_sets(jobs=[f'J{i}' for i in range(1000)],
                  jobSize=100*[1,2,3,4,5,6,7,8,9,10], max_size=4)

Output:

...
J110 (1) J1 (2) J0 (1), J100 (1) J1 (2) J0 (1), J90 (1) J1 (2) J0 (1), J80 (1) J1 (2) J0 (1), J70 (1) J1 (2) J0 (1), J60 (1) J1 (2) J0 (1), J50 (1) J1 (2) J0 (1), J40 (1) J1 (2) J0 (1), J30 (1) J1 (2) J0 (1), J20 (1) J1 (2) J0 (1), J10 (1) J1 (2) J0 (1), J1 (2) J0 (1), J0 (1)]
Number of sets: 4608225
Time: 122.6 seconds