CommencerCommencer gratuitement

Solving CliffWalking with decayed epsilon-greedy strategy

Enhancing the epsilon-greedy strategy, a decay factor is introduced to gradually decrease the exploration rate, epsilon, as the agent learns more about the environment. This approach promotes exploration in the early stages of learning and exploitation of the learned knowledge as the agent becomes more familiar with the environment. Now, you'll apply this strategy to solve the CliffWalking environment.

The environment has been initialized, and can be accessed by the variable env. The variables epsilon, min_epsilon, and epsilon_decay have been pre-defined for you. The functions epsilon_greedy() and update_q_table() have been imported.

Cet exercice fait partie du cours

Reinforcement Learning with Gymnasium in Python

Afficher le cours

Instructions

  • Implement the full training loop by choosing an action, executing it, accumulating the reward received to episode_reward, updating the Q-table.
  • Decrease epsilon using the epsilon_decay rate, ensuring it does not fall below min_epsilon.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

rewards_decay_eps_greedy = []
for episode in range(total_episodes):
    state, info = env.reset()
    episode_reward = 0
    for i in range(max_steps):
      	# Implement the training loop
        action = ____
        new_state, reward, terminated, truncated, info = ____
        episode_reward += ____       
        ____      
        state = new_state
    rewards_decay_eps_greedy.append(episode_reward)
    # Update epsilon
    epsilon = ____
print("Average reward per episode: ", np.mean(rewards_decay_eps_greedy))
Modifier et exécuter le code