I am implementing a variant of the the ants algorithm in a discrete 2d grid. The design is similar to how Reinforcement Learning environments work, where there is a state
variable, and action(s)
to transition the environment from one state to another. The state of the environment is the integer values of how many times each one of the four surrounding cells has been visited, for each ant. For example with 2 ants:
>>> state
array([[18, 5, 0, 21], # ant 0: right (visited 18 times), up (visited 5 times), left (not visited), down (visited 21 times)
[ 6, 32, 6, 12]])
Now, I want the ants to go towards the least visited cell(s), with equal probabilities. So ant 0 will choose to go left with probability 1, and ant 1 will choose between right and left with equal probability.
I have been accomplishing this using a for loop over the agents, like so:
def choose_action(self, state: np.ndarray) -> np.ndarray:
"""
Choose ants actions based on the state of the environment.
:param state: the integer values of the four surrounding cells for each ant (n_ants x 4)
:return: actions' indices for ants
"""
actions = np.zeros(self.n_ants, dtype=int)
for i in range(self.n_ants):
state[i] = state[i] == state[i].min()
p = state[i] / state[i].sum()
actions[i] = np.random.choice(self.n_actions, p=p)
return actions
But is there a way to do this without a for loop?
Note: actions indices correspond to the order of the directions in the state
variable. Meaning that action index 0 moves the agent to the right, and so on.
CodePudding user response:
The calculation process can avoid loops, but I haven't found a way to avoid loops when generating random numbers:
def choose_action(self, state: np.ndarray) -> np.ndarray:
state = state == state.min(1, keepdims=True)
probs = state / state.sum(1, keepdims=True)
gen_actions = (np.random.choice(self.n_actions, p=p) for p in probs)
actions = np.fromiter(gen_actions, int, self.n_ants)
return actions
CodePudding user response:
I finally found the answer here, which is asking for an argmax with random tie-breaking. With a minor modification from the answer, the choose_action
method becomes:
def choose_action(self, state: np.ndarray) -> np.ndarray:
"""
Choose an action to take based on the state of the environment.
:param state: the integer values of the four surrounding cells
:return: the action to take
"""
rands = np.random.random(array.shape)
mask = array == array.min(axis=1, keepdims=True)
return np.argmax(rands * mask, axis=1)