1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning with Gymnasium in Python

Connected

Exercise

Defining a deterministic policy

In this exercise, you'll be working with a custom environment called MyGridWorld, the same one you've seen in the video. This environment is a grid world where the agent's goal is to reach the diamond as quickly as possible. Your task is to define a policy that guides the agent's behavior as specified in the figure below.

Image showing the policy:  states 0, 1, 6, 7 - action right.  states 2, 3 - action down.  states 4, 5 - action left.

Actions are represented as: (0 → left, 1 → down, 2 → right, 3 → up).

The gymnasium library has been imported for you as gym along with the render() function.

Instructions 1/2

undefined XP
    1
    2
  • Create an instance env for the environment using MyGridWorld as an environment ID and 'rgb_array' as render_mode.
  • Define the policy as shown in the figure as a Python dictionary.