1. 学习
  2. /
  3. 课程
  4. /
  5. Deep Reinforcement Learning in Python

Connected

练习

The policy network architecture

Build the architecture for a Policy Network that you can use later to train your policy gradient agent.

The policy network takes the state as input, and outputs a probability in the action space. For the Lunar Lander environment, you work with four discrete actions, so you want your network to output a probability for each of those actions.

说明

100 XP
  • Indicate the size for the output layer of the policy network; for flexibility, use the variable name rather than the actual number.
  • Ensure the final layer returns probabilities.