Get startedGet started for free

Epsilon-greediness

In this exercise, you will implement a select_action() function that applies decayed epsilon-greediness.

Epsilon-greediness will encourage your agent to explore the environment, which should improve learning!

The epsilon-greediness schedule determines a threshold \(\varepsilon\) for any given step, as given by the formula: $$\varepsilon = end + (start-end) \cdot e^{-\frac{step}{decay}}$$

select_action() should return a random action with probability \(\varepsilon\), and the action with highest Q-value with probability \(1-\varepsilon\).

This exercise is part of the course

Deep Reinforcement Learning in Python

View Course

Exercise instructions

  • Calculate the threshold epsilon for the given value of step.
  • Draw a random number between 0 and 1.
  • With probability epsilon, return a random action.
  • With probability 1-epsilon, return the action with highest Q-value.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

def select_action(q_values, step, start, end, decay):
    # Calculate the threshold value for this step
    epsilon = end + (____) * math.exp(____ / ____)
    # Draw a random number between 0 and 1
    sample = random.____
    if sample < epsilon:
        # Return a random action index
        return random.____
    # Return the action index with highest Q-value
    return torch.____.item()
      
for step in [1, 500, 2500]:
    actions = [select_action(torch.Tensor([1, 2, 3, 5]), step, .9, .05, 1000) for _ in range(20)]
    print(f"Selecting 20 actions at step {step}.\nThe action with highest q-value is action 3.\nSelected actions: {actions}\n\n")
Edit and Run Code