IniziaInizia gratis

Solving 8x8 Frozen Lake with Q-learning

In this exercise, you'll apply the Q-learning algorithm to learn an optimal policy for navigating through the 8x8 Frozen Lake environment, this time with the "slippery" condition enabled. The challenge introduces stochastic transitions, making the agent's movement unpredictable and thus more closely simulating real-world scenarios.

A Q-table Q has been initialized and pre-loaded for you, along with the update_q_table() function from the previous exercise and an empty list rewards_per_episode that will contain the total reward accumulated through each episode.

Questo esercizio fa parte del corso

Reinforcement Learning with Gymnasium in Python

Visualizza il corso

Istruzioni dell'esercizio

  • For each episode, execute the selected action and observe the reward and next state.
  • Update the Q-table.
  • Append the total_reward to the rewards_per_episode list.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

for episode in range(10000):
    state, info = env.reset()
    total_reward = 0
    terminated = False
    while not terminated:
        action = env.action_space.sample()
        # Execute the action
        next_state, reward, terminated, truncated, info = ____
        # Update the Q-table
        ____
        state = next_state
        total_reward += reward
    # Append the total reward to the rewards list    
    rewards_per_episode.____(____)
print("Average reward per random episode: ", np.mean(rewards_per_episode))
Modifica ed esegui il codice