I have a list called marbles of 10.000 items (5000 blue and 5000 red)
I want to do a test. To pick 4 random items from the list I do this
import random
marbles = [] # here is empty but it’s actually a list of 10.000 items
A = random.choices(marbles, k=4)
print(A) # this will print a list of 4 random Items from the list
What I need to do is to perform this test 100 times and print the results. I want to avoid creating 100 different variables and then print them all. What can I do to optimize and avoid >100 lines of code. For loops? I would appreciate any input. Thank you in advance
Nothing with my list seemed to have a problema.
CodePudding user response:
Use a for loop.
for x in range(4):
print(random.choice(marbles))
CodePudding user response:
Sampling with and without replacement
It's important to understand the difference between sampling with replacement and without replacement. Say we have a bag of 1 blue and 2 red marbles, and you select 2 marbles. If you put the marble back after pulling the first marble, it's possible to endup with 2 blue marbles. This is called sampling with replacement. Using random.choice
is sampling with replacement.
random.choices() and random.sample()
You can pull more than one element using the choices()
function from the random
module. For example sampling 4 marbles from a bag of 1 red and 2 blue marbles with replacement:
>>> import random
>>> marbles = ['red'] * 1 ['blue'] * 2
>>> random.choices(marbles, k=4)
['red', 'blue', 'blue', 'blue']
You can use sampling without replacement using the random
module using the sample
function:
>>> random.sample(marbles, 4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/random.py", line 482, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
As expected, this gives an error. You can't draw 4 marbles from a bag of 3. Now if we put 1000 red marbles and 2000 blue marbles in the bag, we get:
>>> marbles = ['red'] * 1000 ['blue'] * 2000
>>> random.sample(marbles, 4)
['blue', 'blue', 'blue', 'red']
Memory usage and weights
A possible problem with the examples above is that, if you have more marbles, you need a lot of memory. Therefore, the choice()
function has a weights
parameter. You can use it like this:
>>> marbles = ['red', 'blue']
>>> weights = [1000, 2000]
>>> random.choices(marbles, weights=weights, k=4)
['blue', 'blue', 'blue', 'red']
Sadly, the random
module doesn't have a function for sampling without replacement using weights.
Repeated sampling using for loop
Finally, we need to count the outcomes. A more advanced way to do this is using dictionaries and defaultdict
from the collections
module. As an alternative, we will create a list of outcomes, and loop through the different outcomes using a set of that list.
import random
SAMPLE_SIZE = 4 REPEAT_SAMPLING = 100
outcomes = []
marbles = ['red'] * 5000 ['blue'] * 5000
for i in range(REPEAT_SAMPLING):
outcome = ', '.join(random.sample(marbles, SAMPLE_SIZE))
outcomes.append(outcome)
for outcome in set(outcomes):
print(f'{outcome} appeared {outcomes.count(outcome)} times out of {REPEAT_SAMPLING}')