Home > Net >  Fastest way to randomly sample N elements from a Python list with exclusions
Fastest way to randomly sample N elements from a Python list with exclusions

Time:12-10

Say I have a list of 20 unique numbers and I want to randomly sample N = 3 numbers from the list. For each number, there is the limitation that it cannot be in the final sample with some other numbers, given in a dictionary exclusions.

lst = list(range(20))
N = 3
exclusions = {0: [5], 1: [3, 13], 2: [10], 3: [1, 4, 18],
              4: [3, 15, 17, 19], 5: [0], 6: [12], 7: [13, 15],
              8: [10], 9: [16], 10: [2, 8, 12], 11: [15],
              12: [6, 10], 13: [1, 7], 14: [], 15: [4, 7, 11],
              16: [9], 17: [4], 18: [3], 19: [4]}

Right now I am using trial-and-error:

import random
sample = {random.choice(lst)}
n_sample = 1
while n_sample < N:
    s = random.choice([x for x in lst if x not in sample])
    if not set(exclusions[s]).isdisjoint(sample):
        sample.add(s)
        n_sample  = 1

print(sample)
# {10, 2, 12}

However, this is super inefficient and cannot catch the case when there is no solution at all, especially when N is large. Can anyone suggest an efficient way to do this in Python?

CodePudding user response:

If exclusions is pre-computed, you could also pre-compute the allowed sets which satisfy the exclusion constraints. You can do this as per @Sembei Norimaki's suggestion:

Make a set with all numbers, then pick the first number. Then look at the restrictions dictionary and remove the restrictions from your set. Repeat with the remaining elements for the other N-1 numbers

Then at run-time you can randomly pick an allowed set, and sample your N numbers from that set.

Note: The resulting distribution of sampled values will not be uniform here. This is because elements with more exclusions will therefore be excluded from more of the allowed sets. When sampling, given a large amount of samples, your distribution will converge to the distribution of that which you are sampling from.

CodePudding user response:

As already suggested, sets are good companion for these kind of problems:

If you need a random choice the

[x for x in lst if x not in sample]

can be improved using sets

candidates = set(sample) - set(last)

anyway you must avoid the usage of random into the while loop:

hint

Is not clear to me how to handle the 'no solution' case:

lst = set(range(20))
N = 3
exclusions = {0: [5], 1: [3, 13], 2: [10], 3: [1, 4, 18],
              4: [3, 15, 17, 19], 5: [0], 6: [12], 7: [13, 15],
              8: [10], 9: [16], 10: [2, 8, 12], 11: [15],
              12: [6, 10], 13: [1, 7], 14: [], 15: [4, 7, 11],
              16: [9], 17: [4], 18: [3], 19: [4]}
exclusions = {k:set(v) for k,v in exclusions.items()}
sample = {random.choice(list(lst))}
n_sample = 1
while n_sample < N:
    found = False
    for s in lst-sample:
        if not exclusions[s].isdisjoint(sample):
            sample.add(s)
            n_sample  = 1
            found = True
            break
    if not found:
        print('No solution')
        break
    
print(sample)
  • Related