How to use random.sample() within a for-loop to generate multiple, *non-identical* sample lists?-CodePudding

I would like to know how to use the python random.sample() function within a for-loop to generate multiple sample lists that are not identical.

For example, right now I have:

for i in range(3):
    sample = random.sample(range(10), k=2)

This will generate 3 sample lists containing two numbers each, but I would like to make sure none of those sample lists are identical. (It is okay if there are repeating values, i.e., (2,1), (3,2), (3,7) would be okay, but (2,1), (1,2), (5,4) would not.)

CodePudding user response：

If you specifically need to "use random.sample() within a for-loop", then you could keep track of samples that you've seen, and check that new ones haven't been seen yet.

import random

seen = set()
for i in range(3):
    while True:
        sample = random.sample(range(10), k=2)
        print(f'TESTING: {sample = }')  # For demo
        fr = frozenset(sample)
        if fr not in seen:
            seen.add(fr)
            break
    print(sample)

Example output:

TESTING: sample = [0, 7]
[0, 7]
TESTING: sample = [0, 7]
TESTING: sample = [1, 5]
[1, 5]
TESTING: sample = [7, 0]
TESTING: sample = [3, 5]
[3, 5]

Here I made seen a set to allow fast lookups, and I converted sample to a frozenset so that order doesn't matter in comparisons. It has to be frozen because a set can't contain another set.

However, this could be very slow with different inputs, especially a larger range of i or smaller range to draw samples from. In theory, its runtime is infinite, but in practice, random's number generator is finite.

Alternatives

There are other ways to do the same thing that could be much more performant. For example, you could take a big random sample, then chunk it into the desired size:

n = 3
k = 2
upper = 10
sample = random.sample(range(upper), k=k*n)
for chunk in chunks(sample, k):
    print(chunk)

Example output:

[6, 5]
[3, 0]
[1, 8]

With this approach, you'll never get any duplicate numbers like [[2,1], [3,2], [3,7]] because the sample contains all unique numbers.

_{This approach was inspired by Sven Marnach's answer on "Non-repetitive random number in numpy", which I coincidentally just read today.}

CodePudding user response：

it looks like you are trying to make a nested list of certain list items without repetition from original list, you can try below code.

    import random

    mylist = list(range(50))

    def randomlist(mylist,k): 
      length = lambda : len(mylist)
      newlist = []
      while length() >= k:
        newlist.append([mylist.pop(random.randint(0, length() - 1)) for I in range(k)])
      newlist.append(mylist)
      return newlist

    randomlist(mylist,6)


[[2, 20, 36, 46, 14, 30],
 [4, 12, 13, 3, 28, 5],
 [45, 37, 18, 9, 34, 24],
 [31, 48, 11, 6, 19, 17],
 [40, 38, 0, 7, 22, 42],
 [23, 25, 47, 41, 16, 39],
 [8, 33, 10, 43, 15, 26],
 [1, 49, 35, 44, 27, 21],
 [29, 32]]

CodePudding user response：

This should do the trick.

import random
import math

# create set to store samples
a = set()
# number of distinct elements in the population
m = 10
# sample size
k = 2
# number of samples
n = 3

# this protects against an infinite loop (see Safety Note)
if n > math.comb(m, k):
    print(
        f"Error: {math.comb(m, k)} is the number of {k}-combinations "
        f"from a set of {m} distinct elements."
        )
    exit()

# the meat
while len(a) < n:
    a.add(tuple(sorted(random.sample(range(m), k = k))))

print(a)

With a set you are guaranteed to get a collection with no duplicate elements. In a set, you would be allowed to have (1, 2) and (2, 1) inside, which is why sorted is applied. So if [1, 2] is drawn, sorted([1, 2]) returns [1, 2]. And if [2, 1] is subsequently drawn, sorted([2, 1]) returns [1, 2], which won't be added to the set because (1, 2) is already in the set. We use tuple because objects in a set have to be hashable and list objects are not.

I hope this helps. Any questions, please let me know.

Safety Note

To avoid an infinite loop when you change 3 to some large number, you need to know the maximum number of possible samples of the type that you desire.

The relevant mathematical concept for this is a combination.

Suppose your first argument of random.sample() is range(m) where m is some arbitrary positive integer. Note that this means that the sample will be drawn from a population of m distinct members without replacement.
Suppose that you wish to have n samples of length k in total.

The number of possible k-combinations from the set of m distinct elements is

m! / (k! * (m - k)!)

You can get this value via

from math import comb
num_comb = comb(m, k)

comb(m, k) gives the number of different ways to choose k elements from m elements without repetition and without order, which is exactly what we want.

So in the example above, m = 10, k = 2, n = 3.

With these m and k, the number of possible k-combinations from the set of m distinct elements is 45.

You need to ensure that n is less than 45 if you want to use those specific m and k and avoid an infinite loop.