How do you use OpenAI Gym 'wrappers' with a custom Gym environment in Ray Tune?-CodePudding

How do you use OpenAI Gym 'wrappers' with a custom Gym environment in Ray Tune?

Let's say I built a Python class called CustomEnv (similar to the 'CartPoleEnv' class used to create the OpenAI Gym "CartPole-v1" environment) to create my own (custom) reinforcement learning environment, and I am using tune.run() from Ray Tune (in Ray 2.1.0 with Python 3.9.15) to train an agent in my environment using the 'PPO' algorithm:

import ray
from ray import tune
tune.run(
        "PPO",                         # 'PPO' algorithm
        config={"env": CustomEnv,      # custom class used to create an environment
            "framework": "tf2",
            "evaluation_interval": 100, 
            "evaluation_duration": 100,
            },
        checkpoint_freq = 100,             # Save checkpoint at every evaluation
        local_dir=checkpoint_dir,          # Save results to a local directory
        stop{"episode_reward_mean": 250},  # Stopping criterion
        )

This works fine, and I can use TensorBoard to monitor training progress, etc., but as it turns out, learning is slow, so I want to try using 'wrappers' from Gym to scale observations, rewards, and/or actions, limit variance, and speed-up learning. So I've got an ObservationWrapper, a RewardWrapper, and an ActionWrapper to do that--for example, something like this (the exact nature of the scaling is not central to my question):

import gym

class ObservationWrapper(gym.ObservationWrapper):
    def __init__(self, env):
        super().__init__(env)
        self.o_min = 0.
        self.o_max = 5000.

    def observation(self, ob):
        # Normalize observations
        ob = (ob - self.o_min)/(self.o_max - self.o_min)
        return ob

class RewardWrapper(gym.RewardWrapper):
    def __init__(self, env):
        super().__init__(env)
        self.r_min = -500
        self.r_max = 100

    def reward(self, reward):
        # Scale rewards:
        reward = reward/(self.r_max - self.r_min)
        return reward

class ActionWrapper(gym.ActionWrapper):
    def __init__(self, env):
        super().__init__(env)

    def action(self, action):
        # Scale actions
        action = action/10
        return action

Wrappers like these work fine with my custom class when I create an instance of the class on my local machine and use it in traditional training loops, like this:

from my_file import CustomEnv

env = CustomEnv()
wrapped_env = ObservationWrapper(RewardWrapper(ActionWrapper(env)))
episodes = 10

for episode in range(1,episodes 1):
    obs = wrapped_env.reset()
    done = False
    score = 0
    
    while not done:
        action = wrapped_env.action_space.sample()
        obs, reward, done, info = wrapped_env.step(action)
        score  = reward

    print(f'Episode: {episode},  Score: {score:.3f}')

My question is: How can I use wrappers like these with my custom class (CustomEnv) and ray.tune()? This particular method expects the value for "env" to be passed either (1) as a class (such as CustomEnv) or (2) as a string associated with a registered Gym environment (such as "CartPole-v1"), as I found out while trying various incorrect ways to pass a wrapped version of my custom class:

ValueError: >>> is an invalid env specifier. You can specify a custom env as either a class (e.g., YourEnvCls) or a registered env id (e.g., "your_env").

So I am not sure how to do it (assuming it is possible). I would prefer to solve this problem without having to register my custom Gym environment, but I am open to any solution.

In learning about wrappers, I leveraged mostly 'Getting Started With OpenAI Gym: The Basic Building Blocks' by Ayoosh Kathuria, and 'TF 2.0 for Reinforcement Learning: Gym Wrappers'.

CodePudding user response：

I was able to answer my own question about how to get Ray's tune.run() to work with a wrapped custom class for a Gym environment. The documentation for Ray Environments was helpful.

The solution was to register the custom class through Ray. Assuming you have defined your Gym wrappers (classes) as discussed above, it works like this:

from ray.tune.registry import register_env
from your_file import CustomEnv             # import your custom class

def env_creator(env_config):
    # wrap and return an instance of your custom class
    return ObservationWrapper(RewardWrapper(ActionWrapper(CustomEnv())))

# Choose a name and register your custom environment
register_env('WrappedCustomEnv-v0', env_creator)

Now, in tune.run(), you can submit the name of the registered instance as you would any other registered Gym environment:

import ray
from ray import tune

tune.run(
        "PPO",                          # 'PPO' algorithm (for example)
        config={"env": "WrappedCustomEnv-v0", # the registered instance
            #other options here as desired
            },
        # other options here as desired
        )

tune.run() will work with no errors--problem solved!