Home > other >  How can I perform multiple random.choices tests
How can I perform multiple random.choices tests

Time:11-26

I have a list called marbles of 10.000 items (5000 blue and 5000 red)

I want to do a test. To pick 4 random items from the list I do this

import random

marbles = [] # here is empty but it’s actually a list of 10.000 items

A = random.choices(marbles, k=4) 
print(A) # this will print a list of 4 random Items from the list

What I need to do is to perform this test 100 times and print the results. I want to avoid creating 100 different variables and then print them all. What can I do to optimize and avoid >100 lines of code. For loops? I would appreciate any input. Thank you in advance

Nothing with my list seemed to have a problema.

CodePudding user response:

Use a for loop.

for x in range(4):
    print(random.choice(marbles))

CodePudding user response:

Sampling with and without replacement

It's important to understand the difference between sampling with replacement and without replacement. Say we have a bag of 1 blue and 2 red marbles, and you select 2 marbles. If you put the marble back after pulling the first marble, it's possible to endup with 2 blue marbles. This is called sampling with replacement. Using random.choice is sampling with replacement.

random.choices() and random.sample()

You can pull more than one element using the choices() function from the random module. For example sampling 4 marbles from a bag of 1 red and 2 blue marbles with replacement:

>>> import random
>>> marbles = ['red'] * 1   ['blue'] * 2
>>> random.choices(marbles, k=4)
['red', 'blue', 'blue', 'blue']

You can use sampling without replacement using the random module using the sample function:

>>> random.sample(marbles, 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/random.py", line 482, in sample
    raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative

As expected, this gives an error. You can't draw 4 marbles from a bag of 3. Now if we put 1000 red marbles and 2000 blue marbles in the bag, we get:

>>> marbles = ['red'] * 1000   ['blue'] * 2000
>>> random.sample(marbles, 4)
['blue', 'blue', 'blue', 'red']

Memory usage and weights

A possible problem with the examples above is that, if you have more marbles, you need a lot of memory. Therefore, the choice() function has a weights parameter. You can use it like this:

>>> marbles = ['red', 'blue']
>>> weights = [1000, 2000]
>>> random.choices(marbles, weights=weights, k=4)
['blue', 'blue', 'blue', 'red']

Sadly, the random module doesn't have a function for sampling without replacement using weights.

Repeated sampling using for loop

Finally, we need to count the outcomes. A more advanced way to do this is using dictionaries and defaultdict from the collections module. As an alternative, we will create a list of outcomes, and loop through the different outcomes using a set of that list.

import random

SAMPLE_SIZE = 4 REPEAT_SAMPLING = 100

outcomes = []
marbles = ['red'] * 5000   ['blue'] * 5000

for i in range(REPEAT_SAMPLING):
    outcome = ', '.join(random.sample(marbles, SAMPLE_SIZE))
    outcomes.append(outcome)

for outcome in set(outcomes):
    print(f'{outcome} appeared {outcomes.count(outcome)} times out of {REPEAT_SAMPLING}')
  • Related