How do you use OpenAI Gym 'wrappers' with a custom Gym environment in Ray Tune?
Let's say I built a Python class called CustomEnv
(similar to the 'CartPoleEnv' class used to create the OpenAI Gym "CartPole-v1"
environment) to create my own (custom) reinforcement learning environment, and I am using tune.run()
from Ray Tune (in Ray 2.1.0 with Python 3.9.15) to train an agent in my environment using the 'PPO' algorithm:
import ray
from ray import tune
tune.run(
"PPO", # 'PPO' algorithm
config={"env": CustomEnv, # custom class used to create an environment
"framework": "tf2",
"evaluation_interval": 100,
"evaluation_duration": 100,
},
checkpoint_freq = 100, # Save checkpoint at every evaluation
local_dir=checkpoint_dir, # Save results to a local directory
stop{"episode_reward_mean": 250}, # Stopping criterion
)
This works fine, and I can use TensorBoard to monitor training progress, etc., but as it turns out, learning is slow, so I want to try using 'wrappers' from Gym to scale observations, rewards, and/or actions, limit variance, and speed-up learning. So I've got an ObservationWrapper, a RewardWrapper, and an ActionWrapper to do that--for example, something like this (the exact nature of the scaling is not central to my question):
import gym
class ObservationWrapper(gym.ObservationWrapper):
def __init__(self, env):
super().__init__(env)
self.o_min = 0.
self.o_max = 5000.
def observation(self, ob):
# Normalize observations
ob = (ob - self.o_min)/(self.o_max - self.o_min)
return ob
class RewardWrapper(gym.RewardWrapper):
def __init__(self, env):
super().__init__(env)
self.r_min = -500
self.r_max = 100
def reward(self, reward):
# Scale rewards:
reward = reward/(self.r_max - self.r_min)
return reward
class ActionWrapper(gym.ActionWrapper):
def __init__(self, env):
super().__init__(env)
def action(self, action):
# Scale actions
action = action/10
return action
Wrappers like these work fine with my custom class when I create an instance of the class on my local machine and use it in traditional training loops, like this:
from my_file import CustomEnv
env = CustomEnv()
wrapped_env = ObservationWrapper(RewardWrapper(ActionWrapper(env)))
episodes = 10
for episode in range(1,episodes 1):
obs = wrapped_env.reset()
done = False
score = 0
while not done:
action = wrapped_env.action_space.sample()
obs, reward, done, info = wrapped_env.step(action)
score = reward
print(f'Episode: {episode}, Score: {score:.3f}')
My question is: How can I use wrappers like these with my custom class (CustomEnv
) and ray.tune()
? This particular method expects the value for "env" to be passed either (1) as a class (such as CustomEnv
) or (2) as a string associated with a registered Gym environment (such as "CartPole-v1"
), as I found out while trying various incorrect ways to pass a wrapped version of my custom class:
ValueError: >>> is an invalid env specifier. You can specify a custom env as either a class (e.g., YourEnvCls) or a registered env id (e.g., "your_env").
So I am not sure how to do it (assuming it is possible). I would prefer to solve this problem without having to register my custom Gym environment, but I am open to any solution.
In learning about wrappers, I leveraged mostly 'Getting Started With OpenAI Gym: The Basic Building Blocks' by Ayoosh Kathuria, and 'TF 2.0 for Reinforcement Learning: Gym Wrappers'.
CodePudding user response:
I was able to answer my own question about how to get Ray's tune.run()
to work with a wrapped custom class for a Gym environment. The documentation for Ray Environments was helpful.
The solution was to register the custom class through Ray. Assuming you have defined your Gym wrappers (classes) as discussed above, it works like this:
from ray.tune.registry import register_env
from your_file import CustomEnv # import your custom class
def env_creator(env_config):
# wrap and return an instance of your custom class
return ObservationWrapper(RewardWrapper(ActionWrapper(CustomEnv())))
# Choose a name and register your custom environment
register_env('WrappedCustomEnv-v0', env_creator)
Now, in tune.run()
, you can submit the name of the registered instance as you would any other registered Gym environment:
import ray
from ray import tune
tune.run(
"PPO", # 'PPO' algorithm (for example)
config={"env": "WrappedCustomEnv-v0", # the registered instance
#other options here as desired
},
# other options here as desired
)
tune.run()
will work with no errors--problem solved!