Defining a deterministic policy
In this exercise, you'll be working with a custom environment called MyGridWorld
, the same one you've seen in the video. This environment is a grid world where the agent's goal is to reach the diamond as quickly as possible. Your task is to define a policy that guides the agent's behavior as specified in the figure below.
Actions are represented as: (0 → left, 1 → down, 2 → right, 3 → up).
The gymnasium library has been imported for you as gym
along with the render()
function.
This exercise is part of the course
Reinforcement Learning with Gymnasium in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create the environment
env = ____
state, info = env.reset()
# Define the policy
policy = ____