Solving CliffWalking with epsilon greedy strategy

The CliffWalking environment is a standard testbed for RL algorithms. It's a grid world where an agent must find a path from a start state to a goal state, avoiding cliffs along the way. Using the epsilon-greedy strategy allows the agent to explore the environment effectively while learning to avoid cliffs, maximizing the cumulative reward. Your task is to solve this environment using the epsilon-greedy strategy, compute the rewards attained in each training episode, and save them to the rewards_eps_greedy list.

Within an episode, select an action using the epsilon_greedy() function.
Accumulate the received reward to the episode_reward.
After each episode, append the total episode_reward to the rewards_eps_greedy list for later analysis.

Exercise

Solving CliffWalking with epsilon greedy strategy

Instructions

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise