I'm trying to implement Integer Programming for Nearest Neighbor Classifier in python using cvxpy
.
Short intro
Given a dataset of n
points with a color (red or blue) we would like to choose the minimal number of candidate points, s.t for each point that isn`t a candidate, its closest candidate has the same color.
My flow
Given a set of n
points (with colors) define an indicator vector I
(|I| = n
),
I_i = 1 if and only if vertex i is chosen as a candidate
In addition, I defined two more vectors, named as A
and B
(|A| = |B| = n
) as follow:
A_i = the distance between v_i to it's closest candidate with the **same** color
B_i = the distance between v_i to it's closest candidate with a **different** color
Therefore, I have n
constrains which are:
B_i > A_i
for any i
My target is to minimize the sum of vector I
(which represents the number of candidates)
My Issue
Its seems that the vectors A
, B
are changing because they affected by I
, since when a candidate is chosen, it is affecting its entry in I
which affects A
and B
and the constrains are dependent on those vectors..
Any suggestions?
Thanks !
CodePudding user response:
To recap: you want to find the smallest set of examples belonging to a given training set such that the resulting nearest neighbor classifier achieves perfect accuracy on that training set.
I would suggest that you formulate this as follows. Create a 0–1 variable x(e) for each example e indicating whether e is chosen. For each ordered pair of examples e and e′ with different labels, write a constraint
x(e′) ≤ ∑e′′∈C(e,e′) x(e′′)
where C(e, e′) is the set of examples e′′ with the same label as e such that e′′ is closer to e than e′ is to e (including e′′ = e). This means that, if e′ is chosen, then it is not the nearest chosen example to e.
We also need
∑e x(e) ≥ 1
to disallow the empty set. Finally, the objective is
minimize ∑e x(e).