Exercise

Defining a deterministic policy

In this exercise, you'll be working with a custom environment called MyGridWorld, the same one you've seen in the video. This environment is a grid world where the agent's goal is to reach the diamond as quickly as possible. Your task is to define a policy that guides the agent's behavior as specified in the figure below.

Image showing the policy: states 0, 1, 6, 7 - action right. states 2, 3 - action down. states 4, 5 - action left.

Actions are represented as: (0 → left, 1 → down, 2 → right, 3 → up).

The gymnasium library has been imported for you as gym along with the render() function.

Instructions 1/2

undefined XP

1

2

Create an instance env for the environment using MyGridWorld as an environment ID and 'rgb_array' as render_mode.
Define the policy as shown in the figure as a Python dictionary.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions 1/2

Exercise