Aan de slagGa gratis aan de slag

Defining a deterministic policy

In this exercise, you'll be working with a custom environment called MyGridWorld, the same one you've seen in the video. This environment is a grid world where the agent's goal is to reach the diamond as quickly as possible. Your task is to define a policy that guides the agent's behavior as specified in the figure below.

Image showing the policy: 
states 0, 1, 6, 7 - action right. 
states 2, 3 - action down. 
states 4, 5 - action left.

Actions are represented as: (0 → left, 1 → down, 2 → right, 3 → up).

The gymnasium library has been imported for you as gym along with the render() function.

Deze oefening maakt deel uit van de cursus

Reinforcement Learning with Gymnasium in Python

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create the environment
env = ____
state, info = env.reset()

# Define the policy
policy = ____
Code bewerken en uitvoeren