1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning with Gymnasium in Python

Connected

Exercise

Solving CliffWalking with epsilon greedy strategy

The CliffWalking environment is a standard testbed for RL algorithms. It's a grid world where an agent must find a path from a start state to a goal state, avoiding cliffs along the way. Using the epsilon-greedy strategy allows the agent to explore the environment effectively while learning to avoid cliffs, maximizing the cumulative reward. Your task is to solve this environment using the epsilon-greedy strategy, compute the rewards attained in each training episode, and save them to the rewards_eps_greedy list.

Instructions

100 XP
  • Within an episode, select an action using the epsilon_greedy() function.
  • Accumulate the received reward to the episode_reward.
  • After each episode, append the total episode_reward to the rewards_eps_greedy list for later analysis.