1. Learn
  2. /
  3. Courses
  4. /
  5. Deep Reinforcement Learning in Python

Connected

Exercise

Training the REINFORCE algorithm

You are ready to train your Lunar Lander using REINFORCE! All you need is to implement the REINFORCE training loop, including the REINFORCE loss calculation.

Given that the loss calculation steps span across both the inner and outer loops, you will not use a calculate_loss() function this time.

When the episode if complete, you can use both those quantities to calculate the loss.

For reference, this is the expression for the REINFORCE loss function:

You will again use the describe_episode() function to print out how your agent is doing at each episode.

Instructions

100 XP
  • Append the log probability of the selected action to the episode log probabilities.
  • Increment the episode return with the discounted reward of the current step.
  • Calculate the REINFORCE episode loss.