I am trying to generate dummy data with some probability. Let say I want to have dummy data about people by gender. I already prepare this in R and you can see my code line below.
gender = sample(x=c("M","F"), prob = c(.6, .4),size=100,replace=TRUE)
Now I want to prepare the same thing but now in Python in Pandas Data Frame. Can anybody help me how to solve this problem?
CodePudding user response:
You can use numpy.random.choice
, replace
is True
by default.
>>> np.random.choice(a=["M", "F"], size=100, p=[0.6, 0.4])
array(['F', 'M', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'F', 'M', 'F', 'M',
'M', 'F', 'M', 'M', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'M', 'F',
'F', 'M', 'M', 'F', 'F', 'M', 'F', 'F', 'M', 'F', 'M', 'M', 'F',
'M', 'M', 'F', 'F', 'M', 'M', 'F', 'M', 'M', 'F', 'M', 'M', 'M',
'F', 'F', 'M', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'F', 'F', 'M',
'M', 'F', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'F', 'M', 'F', 'F',
'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'F', 'M', 'M', 'M', 'F',
'F', 'F', 'F', 'F', 'M', 'M', 'F', 'F', 'F'], dtype='<U1')
CodePudding user response:
Try this. random.choices
gets k
choices from the iterable provided:
import random
print(random.choices("MF", weights=[.6,.4], k=100))
Testing:
>>> l = random.choices("MF", weights=[.6,.4], k=100)
>>> l
['M', 'F', 'F', 'M', 'M', 'M', 'M', 'M', 'F', 'M', 'M', 'F', 'M', 'M', 'M', 'F', 'M', 'M', 'M', 'M', 'F', 'M', 'M', 'M', 'F', 'F', 'M', 'F', 'F', 'M', 'M', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'F', 'M', 'M', 'F', 'M', 'F', 'M', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'M', 'F', 'F', 'M', 'M', 'M', 'F', 'F', 'M', 'M', 'M', 'F', 'F', 'F', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'M', 'M', 'M', 'F', 'M', 'M', 'F', 'M', 'M', 'M', 'M', 'M', 'F', 'M', 'M', 'M', 'F', 'F', 'F', 'M', 'F', 'F', 'M']
>>> l.count("M")
60
>>> l.count("F")
40