Let's say I want to sample 0
with probability p0 = 0.5
, 1
with probability p1 = 0.3
, or a 2
with probability p2 = 0.2
. This is fairly simple to do:
p0 = 0.5
p1 = 0.3
p2 = 0.2
idx = np.random.choice(3, p=[p0, p1, p2])
Now, lets say I want to repeat this process N
, each times using different probabilities. Something like:
N = 4
p0 = np.array([0.5, 0.6, 0.7, 0.8])
p1 = np.array([0.3, 0.2, 0.2, 0.1])
p2 = np.array([0.2, 0.2, 0.1, 0.1])
idx = np.empty(N)
for i in range(N):
idx[i] = np.random.choice(3, p=[p0[i], p1[i], p2[i]])
However, this is obviously slow. Ideally I'd like to do this avoiding loops. Is there a simple solution to this problem?
CodePudding user response:
One way is to generate a uniform random array of size N
, compare that to the accumulate probabilities, then take the indexes of the first True
value in each column:
cum_probs = np.cumsum([p0,p1,p2],axis=0)
idx = np.argmax(np.random.uniform(size=N) < cum_probs, axis=0)