Defining a deterministic policy
In this exercise, you'll be working with a custom environment called MyGridWorld
, the same one you've seen in the video. This environment is a grid world where the agent's goal is to reach the diamond as quickly as possible. Your task is to define a policy that guides the agent's behavior as specified in the figure below.
Actions are represented as: (0 → left, 1 → down, 2 → right, 3 → up).
The gymnasium library has been imported for you as gym
along with the render()
function.
Este ejercicio forma parte del curso
Reinforcement Learning with Gymnasium in Python
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Create the environment
env = ____
state, info = env.reset()
# Define the policy
policy = ____