Get startedGet started for free

The clipped probability ratio

You will now implement the clipped probability ratio, an essential component of the PPO objective function.

For reference, the probability ratio is defined as: $$\frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_{old}}(a_t|s_t)}$$

And the clipped probability ratio is: \(\mathrm{clip}(r_t(\theta), 1-\varepsilon, 1+\varepsilon)\).

This exercise is part of the course

Deep Reinforcement Learning in Python

View Course

Exercise instructions

  • Obtain the action probability prob from action_log_prob, and prob_old from action_log_prob_old.
  • Detach the old action log prob from the torch gradient computation graph.
  • Calculate the probability ratio.
  • Clip the surrogate objective.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

log_prob = torch.tensor(.5).log()
log_prob_old = torch.tensor(.4).log()

def calculate_ratios(action_log_prob, action_log_prob_old, epsilon):
    # Obtain prob and prob_old
    prob = ____
    prob_old = ____
    # Detach the old action log prob
    prob_old_detached = ____.____()
    # Calculate the probability ratio
    ratio = ____ / ____
    # Apply clipping
    clipped_ratio = torch.____(ratio, ____, ____)
    print(f"+{'-'*29}+\n|         Ratio: {str(ratio)} |\n| Clipped ratio: {str(clipped_ratio)} |\n+{'-'*29}+\n")
    return (ratio, clipped_ratio)

_ = calculate_ratios(log_prob, log_prob_old, epsilon=.2)
Edit and Run Code