Say I have a list of 10 numbers as below
lst = [1, 3, 6, 10, 15, 20, 27, 28, 30, 40]
I want to get one subset of k
numbers, in which every pair of numbers are at least different from each other by a distance d
. Right now I am using itertools.combinations
, and the code is working fine for small lists.
from itertools import combinations
def k_subset_at_least_d_apart(nums, k, d):
for subset in combinations(nums, k):
a = sorted(subset)
if min([t - s for s, t in zip(a, a[1:])]) >= d:
return subset
return None
For example, if I want to get a subset of 5 with numbers at least 6 apart from each other:
subset = k_subset_at_least_d_apart(lst, k=5, d=6)
print(subset)
# (1, 10, 20, 27, 40)
However, my code becomes too slow when I want to e.g. get a subset of 20 numbers from a list of 50 numbers in which numbers are at least 10 apart. Can anyone suggest a relatively fast algorithm that can first determine whether such subset exists or not, then finds one subset? Thanks in advance.
CodePudding user response:
Sure; you can just greedily repeat the step of taking the smallest valid element:
def k_subset_at_least_d_apart(nums, k, d):
last = -float('inf')
answer = []
for element in sorted(nums):
if element - last >= d:
answer.append(element)
last = element
if len(answer) == k:
return answer
return None
If nums aren't presorted, it's hard to do (asymptotically) much better than this code, which takes O(n log n)
time. You can get an O(n*k)
algorithm with k repeated passes, which may be faster depending on n and k. If they are sorted, you can do the 'greedily take min valid', but with a binary search to find the next smallest valid element, for an O(k log n)
algorithm.
Proof of the greedy algorithm:
Suppose the greedy algorithm gives a solution G = g0, g1, ... gm
and an optimal (length-k) solution is given by A = a0, a1, ... a_(k-1)
, with m <= k-1
(both in sorted, increasing order).
Let i
be the smallest index where ai != gi
. If i is 0, we must have g0 < a0
, since g0
is min(nums)
, so we can replace a0
with g0
in A for another optimal solution A' = g0, a1, ... a_(k-1)
. Otherwise, for i > 0, (details left as an exercise, but very similar to above), if a0 == g0, a1 == g1 ... a_(i-1)==g_(i-1)
, we can also replace ai
with gi
to get another optimal solution.
Eventually we get that there exists an optimal solution A*
such that G
is a prefix of A*
, then we can argue by contradiction that if G
has length below k
and is a proper prefix of an optimal solution, the greedy algorithm would have extended G
when it saw the element a_(m 1)
.