Exercise

A2C with batch updates

In this course so far, you have been using variations around the same core DRL training loop. In practice, there are a number of ways in which this structure can be extended, for example to accommodate batch updates.

You will now revisit the A2C training loop on the Lunar Lander environment, but instead of updating the networks at every step, you will wait until 10 steps have elapsed before running the gradient descent step. By averaging the losses over 10 steps, you will benefit from slightly more stable updates.

Instructions

100 XP

Append the losses from each step to the loss tensors for the current batch.
Calculate the batch losses.
Reinitialize the loss tensors.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise