I'm working with the Cooperative push block environment ( https://github.com/Unity-Technologi...nvironment-Examples.md#cooperative-push-block) (exported in order to use the Python API) using the latest stable version. The issue is that I'm not getting the reward (positives or negatives). It is always 0. If I export the Single push block environment, I receive the rewards correctly. Below you have the code I'm using from the collab example https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Python-API.md
decision_steps, terminal_steps = env.get_steps(behavior_name)
if tracked_agent in decision_steps:
episode_rewards = decision_steps[tracked_agent].reward
print('REWARD', decision_steps.reward) # Always 0
# Each decision_steps[tracked_agent].reward also returns 0
I should receive a negative penalty (-0.0001) or a positive signal 1, 2, 3 as per the docs. Even if they randomly push a block, I receive 0 as reward.
They say in the docs that the reward is given as a "Group reward". I don't know if that implies a change in the above code.
CodePudding user response:
I have received this answer from the Unity ml-agents GitHub issues section:
The DecisionStep also has a group_reward field which is separate from the reward field. The group rewards given to the Cooperative Pushblock agents should be here. We apologize that the collab doesn't point this out explicitly and I will make an update to it.