Np Random Choice with list of probability distributions-CodePudding

I have a set of actions [0,1,2,3] and a policy which is a series of probabilities for each action like [[0.5, 0.4, 0.05, 0.05]...].

How would it be possible to use np.random.choice (or something similar) which chooses from my actions array for each probability distribution and returns the list of choices?

For a concrete example:

actions = [0, 1, 2, 3]
probs = [[0.5, 0.4, 0.05, 0.05], [0.05, 0.05, 0.1, 0.8]]

*magic*

output = [0, 3]

Edit: Sorry I wasnt clear before, I am looking for a way to do this which is efficient without a loop if possible. My current code uses a loop and it makes generating many episodes at a time extremely slow.

CodePudding user response：

Just call rng.choice for each row separately.

rng = np.random.default_rng()
output = [rng.choice(len(actions), p=x) for x in probs]

CodePudding user response：

np.random.choice can take in a p parameter that does what you want:

p: 1-D array-like, optional

The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.

Then, you can achieve the desired output by:

actions = [0, 1, 2, 3]
probs = [[0.5, 0.4, 0.05, 0.05], [0.05, 0.05, 0.1, 0.8]]

output = [np.random.choice(actions, p=prob) for prob in probs]