Epsilon-greediness

In this exercise, you will implement a select_action() function that applies decayed epsilon-greediness.

Epsilon-greediness will encourage your agent to explore the environment, which should improve learning!

The epsilon-greediness schedule determines a threshold $\varepsilon$ for any given step, as given by the formula: $$\varepsilon = end + (start-end) \cdot e^{-\frac{step}{decay}}$$

select_action() should return a random action with probability $\varepsilon$, and the action with highest Q-value with probability $1-\varepsilon$.

Calculate the threshold epsilon for the given value of step.
Draw a random number between 0 and 1.
With probability epsilon, return a random action.
With probability 1-epsilon, return the action with highest Q-value.

Exercise

Epsilon-greediness

Instructions

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise