how to find possible combinations-CodePudding

I have an array of 16 columns and 22 rows.

please advice how to find all possible combinations with the following rules:

I can't choose more than 1 value per row
sum of column's indexes should be equal to 16 (1 1 14,15 1,2 2 2 2 2 6 etc.)
sum of values in selected cells should be between 39 and 42

For example I've choose 5 combinations:

1st: Sum of values(8.4 4.8 5.7 4 3.6 3.8=41.3) Sum of column values = 3 3 2 2 2 2 2=16
2nd: sum of values(36.7) Sum of column values = 3 3 3 3 3 1 = 16
3rd: sum of values (40.8) sum of column values = 3 3 3 7 = 16
4th: sum of values (41.7) sum of column values = 3 3 3 7 = 16
5h sum of values(41) sum of column values = 16 = 16
etc.

I've found how to get all possible column combinations: import itertools

def summs(answer, *dig):

    res = []
    for i in range(1, answer   1):
        for j in itertools.combinations_with_replacement(list(dig), i):
            if sum(list(j)) == answer:
                res.append(j)
    return res

Now I have 231 different column indexes combinations each sum is equal to 16.

[(1, 15), (2, 14), (3, 13), (4, 12), (5, 11), (6, 10), (7, 9), (8, 8), (1, 1, 14), (1, 2, 13), (1, 3, 12), (1, 4, 11), (1, 5, 10), (1, 6, 9), (1, 7, 8), (2, 2, 12), (2, 3, 11), (2, 4, 10), (2, 5, 9)... etc.

Now I'm trying to find all row sum combinations for above column index combinations:

for example for (1,15) array (2 columns, 22 rows) I should find all sum combinations, except the same row sums (row 1 and 2, 2 and 3 etc)

for (1, 5, 10) its array of 3 columns, 22 rows

and so on until (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) which is array of 16 columns, 22 rows

source table looks like this:

red colors are numbers grater than 42

other 5 colors are possible combinations with a sum between 39 and 42

	col1	col2	col3	col4	col5
row1	2	4	7	9	11
row2	3	6	8	11	14
row3	2	5	7	10	12
row4	3	6	9	11	14
row5	2	4	6	8	10
row6	2	4	5	7	9

Condition 1, sum of column index is 5: Possible combinations are [(5),(1, 4), (2, 3), (1, 1, 3), (1, 2, 2), (1, 1, 1, 2), (1, 1, 1, 1, 1)], this means that you can pick value from 5th column or 1 value from col1 & 1 value from col4, or 2 values from col1, 1 value col3, 1 value col1 and 2values from col2, 3 values from col1 and 1 value from col2, etc
Condition 2: find all possible sum combinations between column's items except cells numbers in same row(for example, for (1,4): 2 9 in the same row need to skip it,2 11, 2 10, 2 11,2 8,2 7; 3 9,3 11 in the same row, teed to skip it 3 10, 3 11,3 8,3 7 etc.)
Condition 3:sum of items should not exceed 11 for example, so I should accept only 2 8, 2 7 and 3 7 from the previous step

Output should be like (row5,col5), (row6,col5), [(row1,col1)&(row5,col4)], [(row1,col1)&(row6,col4) etc

CodePudding user response：

It's very inefficient to generate all possible combinations of row/columns and then check the sums for the generated list. The number of required tests is extremely large. For example, if you want to test just 3 columns (2, 5, 9) in all possible rows, then you can select rows in 22!/(3!*(22-3)!) = 1540 ways. And if you want to test 11 columns (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6) in all possible rows, then you can select rows in 22!/(11!*11!) = 705432 ways. So, this way you'll need to test dozens of millions of combinations.

Instead, you can make a recursive solution, failing quickly when you know that a solution is not possible:

import pandas as pd

def find_combinations(df, from_row, current_sum, current_column_sum,
         target_sum_min, target_sum_max, target_column_sum, cells_list):
    """    
    :param df: the dataframe  
    :param from_row: zero-based row in which to perform the search
    :param current_sum: current sum of selected cells above this row
    :param current_column_sum: current sum of column numbers
    :param target_sum_min: minimum required sum of cells 
    :param target_sum_max: maximum required sum of cells
    :param target_column_sum: required sum of column numbers
    :param cells_list: list of selected cells above this row  
    """

    if from_row >= len(df) or \
       current_sum > target_sum_max or \
       current_column_sum >= target_column_sum :
       return

    max_column = max(len(df.columns), target_column_sum - current_column_sum)
    for column in range(max_column):
        number = df.iloc[from_row][column]
        new_sum = current_sum   number

        if new_sum > target_sum_max:
            break

        if target_sum_min <= new_sum <= target_sum_max and \
            current_column_sum   column   1 == target_column_sum:
            print([*cells_list, (from_row   1, column   1)])

        find_combinations(df, from_row   1, new_sum, current_column_sum   column   1,
            target_sum_min, target_sum_max, target_column_sum,
            [*cells_list, (from_row   1, column   1)])

    find_combinations(df, from_row   1, current_sum, current_column_sum,
        target_sum_min, target_sum_max, target_column_sum,
        cells_list)



df = pd.read_csv("data.csv", header=None)

find_combinations(df, 0, 0.0, 0, 39.0, 42.0, 16, [])

PS. I used OCR to extract text data from your image, so my data.csv file just contains data without any header line.