1. Learn
  2. /
  3. Courses
  4. /
  5. Deep Reinforcement Learning in Python

Connected

Exercise

The policy network architecture

Build the architecture for a Policy Network that you can use later to train your policy gradient agent.

The policy network takes the state as input, and outputs a probability in the action space. For the Lunar Lander environment, you work with four discrete actions, so you want your network to output a probability for each of those actions.

Instructions

100 XP
  • Indicate the size for the output layer of the policy network; for flexibility, use the variable name rather than the actual number.
  • Ensure the final layer returns probabilities.