1. Learn
  2. /
  3. Courses
  4. /
  5. Deep Reinforcement Learning in Python

Connected

Exercise

The clipped surrogate objective function

Implement the calculate_loss() function for PPO. This requires coding the key innovation of PPO - the clipped surrogate loss function. It helps constrain the policy update to prevent it from moving too far away from the previous policy on each step.

The formula for the clipped surrogate objective is

Your environment has the clipping hyperparameter epsilon set to 0.2.

Instructions

100 XP
  • Obtain the probability ratios between \pi_\theta and \pi_{\theta_{old}} (unclipped and clipped versions).
  • Calculate the surrogate objectives (unclipped and clipped versions).
  • Calculate the PPO clipped surrogate objective.
  • Calculate the actor loss.