CommencerCommencer gratuitement

Solving 8x8 Frozen Lake with SARSA

In this exercise, you will apply the SARSA algorithm, incorporating the update_q_table() function you previously implemented, to learn an optimal policy for the 8x8 Frozen Lake environment. This environment is identical to the classic 4x4 one, with the only difference of being bigger. You will use the SARSA algorithm to iteratively improve the agent's policy based on the rewards received from the environment.

A Q-table Q has been initialized and pre-loaded for you, along with the update_q_table() function from the previous exercise.

Cet exercice fait partie du cours

Reinforcement Learning with Gymnasium in Python

Afficher le cours

Instructions

  • For each episode in the training process execute the selected action.
  • Choose the next_action randomly.
  • Update the Q-table for the given state and action.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

for episode in range(num_episodes):
    state, info = env.reset()
    action = env.action_space.sample()
    terminated = False
    while not terminated:
      	# Execute the action
        next_state, reward, terminated, truncated, info = ____
        # Choose the next action randomly
        next_action = ____
        # Update the Q-table
        ____
        state, action = next_state, next_action   
render_policy(get_policy())
Modifier et exécuter le code