The simulation has square as both shape1 and shape2 but it shouldn't happen if we run without replacement. I know the code runs well but I don't know how to solve this situation.
import pandas as pd
import random
creating the simulation
def simulation1():
shape1 = random.choice(["square", "pentagon", "octagon"])
return shape1
def simulation2():
shape2 = random.choice(["square", "pentagon", "octagon"])
return shape2
storing the simulation as a dataframe
data = []
for i in range(1000):
d = {"shape1": simulation1(), "shape2": simulation2()}
data.append(d)
df = pd.DataFrame(data)
CodePudding user response:
This should work. You just need to modify the list after the first selection is done. Of course there are many other ways to implement this.
import pandas as pd
import random
def simulation(choices):
shape = random.choice(choices)
return shape
data = []
for i in range(1000):
d = {}
x = ["square", "pentagon", "octagon"]
d['shape1'] = simulation (x) #item 1 is selected
x.remove(d['shape1']) #item selected previously is removed from the list
d['shape2'] = simulation (x) #item 2 is selected without item 1 in the list
data.append(d) #items in list are reset in the next loop
df = pd.DataFrame(data)
The df looks like this,
shape1 shape2
0 octagon pentagon
1 pentagon octagon
2 pentagon octagon
3 octagon square
4 square octagon
... ... ...
995 octagon square
996 octagon pentagon
997 pentagon octagon
998 square octagon
999 square pentagon
You can test if there are any duplicates by doing this,
df['duplicates'] = np.where(df.shape1 == df.shape2, 1, 0) #return a 1 if there are duplicates
df.duplicates.sum() #sum the 3rd column
If the sum is greater than 0, means there are duplicates. I got 0 therefore no duplicates.
You don't even have to define a function, you can do this instead and get the same result.
import pandas as pd
import random
data = []
for i in range(1000):
d = {}
x = ["square", "pentagon", "octagon"]
d['shape1'] = random.choice(x) #item 1 is selected
x.remove(d['shape1']) #item selected previously is removed from the list
d['shape2'] = random.choice(x) #item 2 is selected without item 1 in the list
data.append(d) #items in list are reset in the next loop
df = pd.DataFrame(data)
Finally you can use a function that was built for random sampling without replacement.
import random
import pandas as pd
data = []
x = ["square", "pentagon", "octagon"]
for i in 1000:
shapes = random.sample(x, k=2) #k=2 means you want to pick 2 items without replacement. a list of items containing 2 shapes is created
data.append({"shape1": shapes[0], "shape2": shapes[1]})
df = pd.DataFrame(data)
CodePudding user response:
Random.sample()
is more appropriate here than random.choice()
.
sample(population, k, *, counts=None) method of random.Random instance Chooses k unique random elements from a population sequence or set. Returns a new list containing elements from the population while leaving the original population unchanged. ...
import random
import pandas as pd
x = ["square", "pentagon", "octagon"]
d = []
for _ in 1000:
shapes = random.sample(x, k=2)
d.append({"shape1": shapes[0], "shape2": shapes[1]})
df = pd.DataFrame(d)