Solving CliffWalking with decayed epsilon-greedy strategy
Enhancing the epsilon-greedy strategy, a decay factor is introduced to gradually decrease the exploration rate, epsilon, as the agent learns more about the environment. This approach promotes exploration in the early stages of learning and exploitation of the learned knowledge as the agent becomes more familiar with the environment. Now, you'll apply this strategy to solve the CliffWalking environment.
The environment has been initialized, and can be accessed by the variable env. The variables epsilon, min_epsilon, and epsilon_decay have been pre-defined for you. The functions epsilon_greedy() and update_q_table() have been imported.
Diese Übung ist Teil des Kurses
Reinforcement Learning with Gymnasium in Python
Anleitung zur Übung
- Implement the full training loop by choosing an
action, executing it, accumulating therewardreceived toepisode_reward, updating the Q-table. - Decrease
epsilonusing theepsilon_decayrate, ensuring it does not fall belowmin_epsilon.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
rewards_decay_eps_greedy = []
for episode in range(total_episodes):
state, info = env.reset()
episode_reward = 0
for i in range(max_steps):
# Implement the training loop
action = ____
new_state, reward, terminated, truncated, info = ____
episode_reward += ____
____
state = new_state
rewards_decay_eps_greedy.append(episode_reward)
# Update epsilon
epsilon = ____
print("Average reward per episode: ", np.mean(rewards_decay_eps_greedy))