Action selection in REINFORCE
Write the REINFORCE select_action
function, which will be used by your REINFORCE agent to select an action at every step.
In DQN, the forward pass of the network returned Q-values; in REINFORCE, it returns action probabilities, from which an action can directly be sampled.
A policy network and a state have been loaded in your environment.
torch.distributions.Categorical
has been imported as Categorical.
Cet exercice fait partie du cours
Deep Reinforcement Learning in Python
Instructions
- Obtain the action probabilities as a torch tensor.
- Obtain the torch Distribution corresponding to the action probabilities.
- Sample an action from the distribution.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
def select_action(policy_network, state):
# Obtain the action probabilities
action_probs = ____
print('Action probabilities:', action_probs)
# Instantiate the action distribution
action_dist = Categorical(____)
# Sample an action from the distribution
action = ____
log_prob = action_dist.log_prob(action)
return action.item(), log_prob.reshape(1)
state = torch.rand(8)
action, log_prob = select_action(policy_network, state)
print('Sampled action index:', action)
print(f'Log probability of sampled action: {log_prob.item():.2f}')