Barebone DQN loss function

With the select_action() function now ready, you are just one final step short of being able to train your agent: you will now implement calculate_loss().

The calculate_loss() returns the network loss for any given step of the episode.

For reference, the loss is given by:

The following example data has been loaded in the exercise:

state = torch.rand(8)
next_state = torch.rand(8)
action = select_action(q_network, state)
reward = 1
gamma = .99
done = False

Obtain the current state Q-value.
Obtain the next state Q-value.
Calculate the target Q-value, or TD-target.
Calculate the loss function, i.e. the squared Bellman Error.

Exercise

Barebone DQN loss function

Instructions

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise