Home > OS >  How to find probability without replacement using only pandas and random?
How to find probability without replacement using only pandas and random?

Time:10-11

The simulation has square as both shape1 and shape2 but it shouldn't happen if we run without replacement. I know the code runs well but I don't know how to solve this situation.

import pandas as pd
import random

creating the simulation

def simulation1():
    shape1 = random.choice(["square", "pentagon", "octagon"])
    return shape1
def simulation2():
    shape2 = random.choice(["square", "pentagon", "octagon"])
    return shape2

storing the simulation as a dataframe

data = []
for i in range(1000):
    d = {"shape1": simulation1(), "shape2": simulation2()}
    data.append(d)
    
df = pd.DataFrame(data)

CodePudding user response:

This should work. You just need to modify the list after the first selection is done. Of course there are many other ways to implement this.

import pandas as pd
import random

def simulation(choices):
    shape = random.choice(choices)
    return shape

data = []
for i in range(1000):
    d = {}
    x = ["square", "pentagon", "octagon"] 
    d['shape1'] = simulation (x) #item 1 is selected
    x.remove(d['shape1']) #item selected previously is removed from the list
    d['shape2'] = simulation (x) #item 2 is selected without item 1 in the list
    data.append(d) #items in list are reset in the next loop
    
df = pd.DataFrame(data)

The df looks like this,

    shape1  shape2
0   octagon     pentagon
1   pentagon    octagon
2   pentagon    octagon
3   octagon     square
4   square  octagon
...     ...     ...
995     octagon     square
996     octagon     pentagon
997     pentagon    octagon
998     square  octagon
999     square  pentagon

You can test if there are any duplicates by doing this,

df['duplicates'] = np.where(df.shape1 == df.shape2, 1, 0) #return a 1 if there are duplicates
df.duplicates.sum() #sum the 3rd column

If the sum is greater than 0, means there are duplicates. I got 0 therefore no duplicates.

You don't even have to define a function, you can do this instead and get the same result.

import pandas as pd
import random

data = []
for i in range(1000):
    d = {}
    x = ["square", "pentagon", "octagon"] 
    d['shape1'] = random.choice(x) #item 1 is selected
    x.remove(d['shape1']) #item selected previously is removed from the list
    d['shape2'] = random.choice(x) #item 2 is selected without item 1 in the list
    data.append(d) #items in list are reset in the next loop
    
df = pd.DataFrame(data)

Finally you can use a function that was built for random sampling without replacement.

import random
import pandas as pd

data = []
x = ["square", "pentagon", "octagon"]
for i in 1000:
    shapes = random.sample(x, k=2) #k=2 means you want to pick 2 items without replacement. a list of items containing 2 shapes is created
    data.append({"shape1": shapes[0], "shape2": shapes[1]})

df = pd.DataFrame(data)

CodePudding user response:

Random.sample() is more appropriate here than random.choice().

sample(population, k, *, counts=None) method of random.Random instance
    Chooses k unique random elements from a population sequence or set.

    Returns a new list containing elements from the population while
    leaving the original population unchanged.
    ...
import random
import pandas as pd

x = ["square", "pentagon", "octagon"]
d = []
for _ in 1000:
    shapes = random.sample(x, k=2)
    d.append({"shape1": shapes[0], "shape2": shapes[1]})

df = pd.DataFrame(d)
  • Related