I have 100 elements. Each element has 4 features A,B,C,D. Each feature is an integer.
I want to select 2 elements for each feature, so that I have selected a total of 8 distinct elements. I want to maximize the sum of the 8 selected features A,A,B,B,C,C,D,D.
A greedy algorithm would be to select the 2 elements with highest A, then the two elements with highest B among the remaining elements, etc. However, this might not be optimal, because the elements that have highest A could also have a much higher B.
Do we have an algorithm to solve such a problem optimally?
CodePudding user response:
This can be solved as a minimum cost flow problem. In particular, this is an Assignment problem
First of all, see that we only need the 8 best elements of each features, meaning 32 elements maximum. It should even be possible to cut the search space further (as if the 2 best elements of A is not one of the 6 best elements of any other feature, we can already assigne those 2 elements to A, and each other feature only needs to look at the first 6 best elements. If it's not clear why, I'll try to explain further).
Then we make the vertices S,T and Fa,Fb,Fc,Fd and E1,E2,...E32, with the following edge :
- for each vertex Fx, an edge from S to Fx with maximum flow 2 and a weight of 0 (as we want 2 element for each feature)
- for each vertex Ei, an edge from Fx to Ei if Ei is one of the top elements of feature x, with maximum flow 1 and weight equal to the negative value of feature x of Ei. (negative because the algorithm will find the minimum cost)
- for each vertex Ei, an edge from Ei to T, with maximum flow 1 and weight 0. (as each element can only be selected once)
I'm not sure if this is the best way, but It should work.
CodePudding user response:
As suggested per @AloisChristen, this can be written as an assignment problem:
- On the one side, we select the 8 best elements for each feature; that's 32 elements or less, since one element might be in the best 8 for more than one feature;
- On the other side, we put 8 seats A,A,B,B,C,C,D,D
- Solve the resulting assignment problem.
Here the problem is solved using scipy's linear_sum_assignment
optimization function:
from numpy.random import randint
from numpy import argpartition, unique, concatenate
from scipy.optimize import linear_sum_assignment
# PARAMETERS
n_elements = 100
n_features = 4
n_per_feature = 2
# RANDOM DATA
data = randint(0, 21, (n_elements, n_features)) # random data with integer features between 0 and 20 included
# SELECT BEST 8 CANDIDATES FOR EACH FEATURE
n_selected = n_features * n_per_feature
n_candidates = n_selected * n_features
idx = argpartition(data, range(-n_candidates, 0), axis=0)
idx = unique(idx[-n_selected:].ravel())
candidates = data[idx]
n_candidates = candidates.shape[0]
# SOLVE ASSIGNMENT PROBLEM
cost_matrix = -concatenate((candidates,candidates), axis=1) # 8 columns in order ABCDABCD
element_idx, seat_idx = linear_sum_assignment(cost_matrix)
score = -cost_matrix[element_idx, seat_idx].sum()
# DISPLAY RESULTS
print('SUM OF SELECTED FEATURES: {}'.format(score))
for e,s in zip(element_idx, seat_idx):
print('{:2d}'.format(idx[e]),
'ABCDABCD'[s],
-cost_matrix[e,s],
data[idx[e]])
Output:
SUM OF SELECTED FEATURES: 160
3 B 20 [ 5 20 14 11]
4 A 20 [20 9 3 12]
6 C 20 [ 3 3 20 8]
10 A 20 [20 10 9 9]
13 C 20 [16 12 20 18]
23 D 20 [ 6 10 4 20]
24 B 20 [ 5 20 6 8]
27 D 20 [20 13 19 20]