1. Learn
  2. /
  3. Courses
  4. /
  5. Deep Reinforcement Learning in Python

Connected

Exercise

Epsilon-greediness

In this exercise, you will implement a select_action() function that applies decayed epsilon-greediness.

Epsilon-greediness will encourage your agent to explore the environment, which should improve learning!

The epsilon-greediness schedule determines a threshold \(\varepsilon\) for any given step, as given by the formula: $$\varepsilon = end + (start-end) \cdot e^{-\frac{step}{decay}}$$

select_action() should return a random action with probability \(\varepsilon\), and the action with highest Q-value with probability \(1-\varepsilon\).

Instructions

100 XP
  • Calculate the threshold epsilon for the given value of step.
  • Draw a random number between 0 and 1.
  • With probability epsilon, return a random action.
  • With probability 1-epsilon, return the action with highest Q-value.