道练习

The policy network architecture

Build the architecture for a Policy Network that you can use later to train your policy gradient agent.

The policy network takes the state as input, and outputs a probability in the action space. For the Lunar Lander environment, you work with four discrete actions, so you want your network to output a probability for each of those actions.

说明

100 XP

Indicate the size for the output layer of the policy network; for flexibility, use the variable name rather than the actual number.
Ensure the final layer returns probabilities.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}道练习

说明

道练习