I would like to know how to use the python random.sample()
function within a for
-loop to generate multiple sample lists that are not identical.
For example, right now I have:
for i in range(3):
sample = random.sample(range(10), k=2)
This will generate 3 sample lists containing two numbers each, but I would like to make sure none of those sample lists are identical. (It is okay if there are repeating values, i.e., (2,1)
, (3,2)
, (3,7)
would be okay, but (2,1)
, (1,2)
, (5,4)
would not.)
CodePudding user response:
If you specifically need to "use random.sample()
within a for-loop", then you could keep track of sample
s that you've seen, and check that new ones haven't been seen yet.
import random
seen = set()
for i in range(3):
while True:
sample = random.sample(range(10), k=2)
print(f'TESTING: {sample = }') # For demo
fr = frozenset(sample)
if fr not in seen:
seen.add(fr)
break
print(sample)
Example output:
TESTING: sample = [0, 7]
[0, 7]
TESTING: sample = [0, 7]
TESTING: sample = [1, 5]
[1, 5]
TESTING: sample = [7, 0]
TESTING: sample = [3, 5]
[3, 5]
Here I made seen
a set
to allow fast lookups, and I converted sample
to a frozenset
so that order doesn't matter in comparisons. It has to be frozen because a set
can't contain another set
.
However, this could be very slow with different inputs, especially a larger range of i
or smaller range to draw samples from. In theory, its runtime is infinite, but in practice, random
's number generator is finite.
Alternatives
There are other ways to do the same thing that could be much more performant. For example, you could take a big random sample, then chunk it into the desired size:
n = 3
k = 2
upper = 10
sample = random.sample(range(upper), k=k*n)
for chunk in chunks(sample, k):
print(chunk)
Example output:
[6, 5]
[3, 0]
[1, 8]
With this approach, you'll never get any duplicate numbers like [[2,1], [3,2], [3,7]]
because the sample contains all unique numbers.
This approach was inspired by Sven Marnach's answer on "Non-repetitive random number in numpy", which I coincidentally just read today.
CodePudding user response:
it looks like you are trying to make a nested list of certain list items without repetition from original list, you can try below code.
import random
mylist = list(range(50))
def randomlist(mylist,k):
length = lambda : len(mylist)
newlist = []
while length() >= k:
newlist.append([mylist.pop(random.randint(0, length() - 1)) for I in range(k)])
newlist.append(mylist)
return newlist
randomlist(mylist,6)
[[2, 20, 36, 46, 14, 30],
[4, 12, 13, 3, 28, 5],
[45, 37, 18, 9, 34, 24],
[31, 48, 11, 6, 19, 17],
[40, 38, 0, 7, 22, 42],
[23, 25, 47, 41, 16, 39],
[8, 33, 10, 43, 15, 26],
[1, 49, 35, 44, 27, 21],
[29, 32]]
CodePudding user response:
This should do the trick.
import random
import math
# create set to store samples
a = set()
# number of distinct elements in the population
m = 10
# sample size
k = 2
# number of samples
n = 3
# this protects against an infinite loop (see Safety Note)
if n > math.comb(m, k):
print(
f"Error: {math.comb(m, k)} is the number of {k}-combinations "
f"from a set of {m} distinct elements."
)
exit()
# the meat
while len(a) < n:
a.add(tuple(sorted(random.sample(range(m), k = k))))
print(a)
With a set
you are guaranteed to get a collection with no duplicate elements. In a set, you would be allowed to have (1, 2)
and (2, 1)
inside, which is why sorted
is applied. So if [1, 2]
is drawn, sorted([1, 2])
returns [1, 2]
. And if [2, 1]
is subsequently drawn, sorted([2, 1])
returns [1, 2]
, which won't be added to the set because (1, 2)
is already in the set. We use tuple
because objects in a set
have to be hashable and list
objects are not.
I hope this helps. Any questions, please let me know.
Safety Note
To avoid an infinite loop when you change 3 to some large number, you need to know the maximum number of possible samples of the type that you desire.
The relevant mathematical concept for this is a combination.
- Suppose your first argument of
random.sample()
isrange(m)
wherem
is some arbitrary positive integer. Note that this means that the sample will be drawn from a population ofm
distinct members without replacement. - Suppose that you wish to have
n
samples of lengthk
in total.
The number of possible k
-combinations from the set of m
distinct elements is
m! / (k! * (m - k)!)
You can get this value via
from math import comb
num_comb = comb(m, k)
comb(m, k)
gives the number of different ways to choose k
elements from m
elements without repetition and without order, which is exactly what we want.
So in the example above, m = 10
, k = 2
, n = 3
.
With these m
and k
, the number of possible k
-combinations from the set of m
distinct elements is 45.
You need to ensure that n
is less than 45 if you want to use those specific m
and k
and avoid an infinite loop.