ComenzarEmpieza gratis

Evaluating policy on a slippery Frozen Lake

In a slippery Frozen Lake environment, merely deducing the policy from a learned Q-table isn't sufficient to gauge its effectiveness. To accurately assess the suitability of a learned policy, you must play multiple episodes, observing the average reward achieved. This exercise compares the effectiveness of the learned policy against a baseline established by following a random policy during training. Your task is to execute the learned policy over several episodes and analyze its performance based on the average rewards collected, contrasting it with the average rewards collected during the random policy phase.

The Q-table Q, num_states, num_actions, and avg_reward_per_random_episode have been pre-loaded for you. The NumPy library has been imported as np.

Este ejercicio forma parte del curso

Reinforcement Learning with Gymnasium in Python

Ver curso

Instrucciones del ejercicio

  • In each iteration, select the best action to take based on learned Q-table Q.
  • Compute the average reward per learned episode avg_reward_per_learned_episode.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

for episode in range(10000):
    state, info = env.reset()
    terminated = False
    episode_reward = 0
    while not terminated:
        # Select the best action based on learned Q-table
        action = ____
        new_state, reward, terminated, truncated, info = env.step(action)
        state = new_state
        episode_reward += reward
    reward_per_learned_episode.append(episode_reward)
# Compute and print the average reward per learned episode
avg_reward_per_learned_episode = ____
print("Average reward per learned episode: ", avg_reward_per_learned_episode)
print("Average reward per random episode: ", ____)
Editar y ejecutar código