Home > Blockchain >  Efficient way to get a k-subset from a list of numbers in which numbers are at least d apart?
Efficient way to get a k-subset from a list of numbers in which numbers are at least d apart?

Time:09-21

Say I have a list of 10 numbers as below

lst = [1, 3, 6, 10, 15, 20, 27, 28, 30, 40]

I want to get one subset of k numbers, in which every pair of numbers are at least different from each other by a distance d. Right now I am using itertools.combinations, and the code is working fine for small lists.

from itertools import combinations

def k_subset_at_least_d_apart(nums, k, d):
    for subset in combinations(nums, k):
        a = sorted(subset)
        if min([t - s for s, t in zip(a, a[1:])]) >= d:
            return subset
    return None

For example, if I want to get a subset of 5 with numbers at least 6 apart from each other:

subset = k_subset_at_least_d_apart(lst, k=5, d=6)
print(subset)
# (1, 10, 20, 27, 40)

However, my code becomes too slow when I want to e.g. get a subset of 20 numbers from a list of 50 numbers in which numbers are at least 10 apart. Can anyone suggest a relatively fast algorithm that can first determine whether such subset exists or not, then finds one subset? Thanks in advance.

CodePudding user response:

Sure; you can just greedily repeat the step of taking the smallest valid element:

def k_subset_at_least_d_apart(nums, k, d):
    last = -float('inf')
    answer = []
    for element in sorted(nums):
        if element - last >= d:
            answer.append(element)
            last = element
            if len(answer) == k:
                return answer
    return None

If nums aren't presorted, it's hard to do (asymptotically) much better than this code, which takes O(n log n) time. You can get an O(n*k) algorithm with k repeated passes, which may be faster depending on n and k. If they are sorted, you can do the 'greedily take min valid', but with a binary search to find the next smallest valid element, for an O(k log n) algorithm.

Proof of the greedy algorithm:

Suppose the greedy algorithm gives a solution G = g0, g1, ... gm and an optimal (length-k) solution is given by A = a0, a1, ... a_(k-1), with m <= k-1 (both in sorted, increasing order).

Let i be the smallest index where ai != gi. If i is 0, we must have g0 < a0, since g0 is min(nums), so we can replace a0 with g0 in A for another optimal solution A' = g0, a1, ... a_(k-1). Otherwise, for i > 0, (details left as an exercise, but very similar to above), if a0 == g0, a1 == g1 ... a_(i-1)==g_(i-1), we can also replace ai with gi to get another optimal solution.

Eventually we get that there exists an optimal solution A* such that G is a prefix of A*, then we can argue by contradiction that if G has length below k and is a proper prefix of an optimal solution, the greedy algorithm would have extended G when it saw the element a_(m 1).

  • Related