Epsilon-greediness
In this exercise, you will implement a select_action()
function that applies decayed epsilon-greediness.
Epsilon-greediness will encourage your agent to explore the environment, which should improve learning!
The epsilon-greediness schedule determines a threshold \(\varepsilon\) for any given step
, as given by the formula:
$$\varepsilon = end + (start-end) \cdot e^{-\frac{step}{decay}}$$
select_action()
should return a random action with probability \(\varepsilon\), and the action with highest Q-value with probability \(1-\varepsilon\).
This exercise is part of the course
Deep Reinforcement Learning in Python
Exercise instructions
- Calculate the threshold
epsilon
for the given value ofstep
. - Draw a random number between 0 and 1.
- With probability
epsilon
, return a random action. - With probability
1-epsilon
, return the action with highest Q-value.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def select_action(q_values, step, start, end, decay):
# Calculate the threshold value for this step
epsilon = end + (____) * math.exp(____ / ____)
# Draw a random number between 0 and 1
sample = random.____
if sample < epsilon:
# Return a random action index
return random.____
# Return the action index with highest Q-value
return torch.____.item()
for step in [1, 500, 2500]:
actions = [select_action(torch.Tensor([1, 2, 3, 5]), step, .9, .05, 1000) for _ in range(20)]
print(f"Selecting 20 actions at step {step}.\nThe action with highest q-value is action 3.\nSelected actions: {actions}\n\n")