Finding the best combination of elements with the max total parameter value-CodePudding

I have 100 elements. Each element has 4 features A,B,C,D. Each feature is an integer.

I want to select 2 elements for each feature, so that I have selected a total of 8 distinct elements. I want to maximize the sum of the 8 selected features A,A,B,B,C,C,D,D.

A greedy algorithm would be to select the 2 elements with highest A, then the two elements with highest B among the remaining elements, etc. However, this might not be optimal, because the elements that have highest A could also have a much higher B.

Do we have an algorithm to solve such a problem optimally?

CodePudding user response：

This can be solved as a minimum cost flow problem. In particular, this is an Assignment problem

First of all, see that we only need the 8 best elements of each features, meaning 32 elements maximum. It should even be possible to cut the search space further (as if the 2 best elements of A is not one of the 6 best elements of any other feature, we can already assigne those 2 elements to A, and each other feature only needs to look at the first 6 best elements. If it's not clear why, I'll try to explain further).

Then we make the vertices S,T and Fa,Fb,Fc,Fd and E1,E2,...E32, with the following edge :

for each vertex Fx, an edge from S to Fx with maximum flow 2 and a weight of 0 (as we want 2 element for each feature)
for each vertex Ei, an edge from Fx to Ei if Ei is one of the top elements of feature x, with maximum flow 1 and weight equal to the negative value of feature x of Ei. (negative because the algorithm will find the minimum cost)
for each vertex Ei, an edge from Ei to T, with maximum flow 1 and weight 0. (as each element can only be selected once)

I'm not sure if this is the best way, but It should work.

CodePudding user response：

As suggested per @AloisChristen, this can be written as an assignment problem:

On the one side, we select the 8 best elements for each feature; that's 32 elements or less, since one element might be in the best 8 for more than one feature;
On the other side, we put 8 seats A,A,B,B,C,C,D,D
Solve the resulting assignment problem.

Here the problem is solved using scipy's linear_sum_assignment optimization function:

from numpy.random import randint
from numpy import argpartition, unique, concatenate
from scipy.optimize import linear_sum_assignment

# PARAMETERS

n_elements = 100
n_features = 4
n_per_feature = 2

# RANDOM DATA

data = randint(0, 21, (n_elements, n_features))  # random data with integer features between 0 and 20 included

# SELECT BEST 8 CANDIDATES FOR EACH FEATURE

n_selected = n_features * n_per_feature
n_candidates = n_selected * n_features
idx = argpartition(data, range(-n_candidates, 0), axis=0)
idx = unique(idx[-n_selected:].ravel())
candidates = data[idx]
n_candidates = candidates.shape[0]

# SOLVE ASSIGNMENT PROBLEM

cost_matrix = -concatenate((candidates,candidates), axis=1)  # 8 columns in order ABCDABCD
element_idx, seat_idx = linear_sum_assignment(cost_matrix)
score = -cost_matrix[element_idx, seat_idx].sum()

# DISPLAY RESULTS

print('SUM OF SELECTED FEATURES: {}'.format(score))
for e,s in zip(element_idx, seat_idx):
    print('{:2d}'.format(idx[e]),
          'ABCDABCD'[s],
          -cost_matrix[e,s],
          data[idx[e]])

Output:

SUM OF SELECTED FEATURES: 160
 3 B 20 [ 5 20 14 11]
 4 A 20 [20  9  3 12]
 6 C 20 [ 3  3 20  8]
10 A 20 [20 10  9  9]
13 C 20 [16 12 20 18]
23 D 20 [ 6 10  4 20]
24 B 20 [ 5 20  6  8]
27 D 20 [20 13 19 20]