Get startedGet started for free

Solving CliffWalking with epsilon greedy strategy

The CliffWalking environment is a standard testbed for RL algorithms. It's a grid world where an agent must find a path from a start state to a goal state, avoiding cliffs along the way. Using the epsilon-greedy strategy allows the agent to explore the environment effectively while learning to avoid cliffs, maximizing the cumulative reward. Your task is to solve this environment using the epsilon-greedy strategy, compute the rewards attained in each training episode, and save them to the rewards_eps_greedy list.

This exercise is part of the course

Reinforcement Learning with Gymnasium in Python

View Course

Exercise instructions

  • Within an episode, select an action using the epsilon_greedy() function.
  • Accumulate the received reward to the episode_reward.
  • After each episode, append the total episode_reward to the rewards_eps_greedy list for later analysis.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

rewards_eps_greedy = []
for episode in range(total_episodes):
    state, info = env.reset()
    episode_reward = 0
    for i in range(max_steps):
      	# Select action with epsilon-greedy strategy
        action = ____
        next_state, reward, terminated, truncated, info = env.step(action)
        # Accumulate reward
        ____        
        update_q_table(state, action, reward, next_state)      
        state = next_state
    # Append the toal reward to the rewards list 
    ____
print("Average reward per episode: ", np.mean(rewards_eps_greedy))
Edit and Run Code