Defining epsilon-greedy function
In RL, the epsilon-greedy strategy is a balance between exploration and exploitation. This method chooses a random action with probability epsilon and the best-known action with probability 1-epsilon. Implementing the epsilon_greedy() function is crucial for algorithms like Q-learning and SARSA, facilitating the agent's learning process by ensuring both exploration of the environment and exploitation of known rewards, and this will be the goal of this exercise.
The numpy library has been imported as np.
Diese Übung ist Teil des Kurses
Reinforcement Learning with Gymnasium in Python
Anleitung zur Übung
- Inside the function, write the suitable condition for an agent to explore the environment.
- Choose a random
actionwhen exploring. - Choose the best
actionaccording to theq_tablewhen exploiting.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
epsilon = 0.2
env = gym.make('FrozenLake')
q_table = np.random.rand(env.observation_space.n, env.action_space.n)
def epsilon_greedy(state):
# Implement the condition to explore
if ____ < ____:
# Choose a random action
action = ____
else:
# Choose the best action according to q_table
action = ____
return action