Solving CliffWalking with epsilon greedy strategy
The CliffWalking environment is a standard testbed for RL algorithms. It's a grid world where an agent must find a path from a start state to a goal state, avoiding cliffs along the way. Using the epsilon-greedy strategy allows the agent to explore the environment effectively while learning to avoid cliffs, maximizing the cumulative reward. Your task is to solve this environment using the epsilon-greedy strategy, compute the rewards attained in each training episode, and save them to the rewards_eps_greedy
list.
This exercise is part of the course
Reinforcement Learning with Gymnasium in Python
Exercise instructions
- Within an episode, select an
action
using theepsilon_greedy()
function. - Accumulate the received
reward
to theepisode_reward
. - After each episode, append the total
episode_reward
to therewards_eps_greedy
list for later analysis.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
rewards_eps_greedy = []
for episode in range(total_episodes):
state, info = env.reset()
episode_reward = 0
for i in range(max_steps):
# Select action with epsilon-greedy strategy
action = ____
next_state, reward, terminated, truncated, info = env.step(action)
# Accumulate reward
____
update_q_table(state, action, reward, next_state)
state = next_state
# Append the toal reward to the rewards list
____
print("Average reward per episode: ", np.mean(rewards_eps_greedy))