IniziaInizia gratis

Defining epsilon-greedy function

In RL, the epsilon-greedy strategy is a balance between exploration and exploitation. This method chooses a random action with probability epsilon and the best-known action with probability 1-epsilon. Implementing the epsilon_greedy() function is crucial for algorithms like Q-learning and SARSA, facilitating the agent's learning process by ensuring both exploration of the environment and exploitation of known rewards, and this will be the goal of this exercise.

The numpy library has been imported as np.

Questo esercizio fa parte del corso

Reinforcement Learning with Gymnasium in Python

Visualizza il corso

Istruzioni dell'esercizio

  • Inside the function, write the suitable condition for an agent to explore the environment.
  • Choose a random action when exploring.
  • Choose the best action according to the q_table when exploiting.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

epsilon = 0.2
env = gym.make('FrozenLake')
q_table = np.random.rand(env.observation_space.n, env.action_space.n)

def epsilon_greedy(state):
    # Implement the condition to explore
    if ____ < ____:
      	# Choose a random action
        action = ____
    else:
      	# Choose the best action according to q_table
        action = ____
    return action
Modifica ed esegui il codice