1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning with Gymnasium in Python

Connected

Exercise

Evaluating policy on a slippery Frozen Lake

In a slippery Frozen Lake environment, merely deducing the policy from a learned Q-table isn't sufficient to gauge its effectiveness. To accurately assess the suitability of a learned policy, you must play multiple episodes, observing the average reward achieved. This exercise compares the effectiveness of the learned policy against a baseline established by following a random policy during training. Your task is to execute the learned policy over several episodes and analyze its performance based on the average rewards collected, contrasting it with the average rewards collected during the random policy phase.

The Q-table Q, num_states, num_actions, and avg_reward_per_random_episode have been pre-loaded for you. The NumPy library has been imported as np.

Instructions

100 XP
  • In each iteration, select the best action to take based on learned Q-table Q.
  • Compute the average reward per learned episode avg_reward_per_learned_episode.