I have a set of actions [0,1,2,3] and a policy which is a series of probabilities for each action like [[0.5, 0.4, 0.05, 0.05]...].
How would it be possible to use np.random.choice (or something similar) which chooses from my actions array for each probability distribution and returns the list of choices?
For a concrete example:
actions = [0, 1, 2, 3]
probs = [[0.5, 0.4, 0.05, 0.05], [0.05, 0.05, 0.1, 0.8]]
*magic*
output = [0, 3]
Edit: Sorry I wasnt clear before, I am looking for a way to do this which is efficient without a loop if possible. My current code uses a loop and it makes generating many episodes at a time extremely slow.
CodePudding user response:
Just call rng.choice
for each row separately.
rng = np.random.default_rng()
output = [rng.choice(len(actions), p=x) for x in probs]
CodePudding user response:
np.random.choice
can take in a p
parameter that does what you want:
p: 1-D array-like, optional
The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.
Then, you can achieve the desired output by:
actions = [0, 1, 2, 3]
probs = [[0.5, 0.4, 0.05, 0.05], [0.05, 0.05, 0.1, 0.8]]
output = [np.random.choice(actions, p=prob) for prob in probs]