1. Interacting with Gymnasium environments
Great work so far! Now, it's time to delve into practical applications using the Gymnasium library in Python, which offers a variety of environments for testing and developing RL algorithms. This video will guide us through the process of creating and interacting with these Gymnasium environments.
2. Gymnasium
Gymnasium provides standardized environments for RL tasks,
abstracting the complexity involved in defining RL problems, enabling us to focus on algorithm development.
The Gymnasium library provides a plethora of environments, from classic control tasks to more complex environments like Atari games.
3. Key Gymnasium environments
Some notable environments include
CartPole, where the agent has to keep a pole balanced on a moving cart and prevent it from falling;
MountainCar, challenging the agent to drive up a steep hill by building momentum;
FrozenLake, tasking the agent with finding a safe path across a hole-filled grid;
and Taxi, where the agent picks up and drops off passengers, focusing on efficient route planning and task management.
4. Gymnasium interface
No matter the environment,
the Gymnasium library offers a unified interface for interaction.
This interface includes functions and methods for initializing the environment,
visually representing it,
executing actions,
and observing outcomes.
Let's explore them using the Cart Pole environment.
5. Creating and initializing the environment
We first import the Gymnasium library as gym.
Then, we create the environment by calling the gym.make() function, passing in the ID of the environment, 'CartPole', along with render_mode='rgb_array', allowing us to visualize the states using Matplotlib. Other render_modes exist, but won't be covered in this course.
Next, env.reset() initializes the environment and returns the initial observation, along with some auxiliary information. The seed argument can be used to ensure reproducibility.
The observation is an array representing the environment's state, including the position and velocity of both the cart and the pole. For other environments, we would need to consult the Gymnasium documentation to understand the details of their states.
6. Visualizing the state
To get a visual representation of the state, the env.render() method, returns a state_image that we can visualize using the plt.imshow() function.
Then, by calling plt.show(), a snapshot of the environment will be displayed.
7. Visualizing the state
To avoid rewriting these three lines of code every time we want to visualize the environment,
we wrap them in a function named render(), which we can then call as needed. This function will be used throughout the course.
8. Performing actions
Now, how to perform actions? In the CartPole environment, there are two possible actions:
moving the cart to the left, represented by 0,
or to the right, represented by 1.
To execute an action, we call env.step() and pass in the chosen action. This method returns five values: the next state, the reward received, a 'terminated' signal indicating whether the agent has reached a terminal state, such as achieving the goal or losing, a 'truncated' signal showing whether a condition like a time limit has been met, and 'info' which provides auxiliary diagnostic information useful for debugging.
9. Performing actions
In this course, we will mostly focus on the first three returned values. Therefore, we omit 'truncated' and 'info' in our script for simplicity,
and we print the first three returned values after moving the cart to the right. We see the state changed and the agent received a reward of one since it hasn't reached a terminal state yet.
10. Interaction loops
Suppose we want to keep pushing the cart to the right until a termination condition is met, and monitor the environment.
We wrap the previous code in a while loop and render the environment at each iteration.
Here are some chosen plots showing the cart's movement to the right.
11. Let's practice!
Time for some practice!