CommencerCommencer gratuitement

Action selection in REINFORCE

Write the REINFORCE select_action function, which will be used by your REINFORCE agent to select an action at every step.

In DQN, the forward pass of the network returned Q-values; in REINFORCE, it returns action probabilities, from which an action can directly be sampled.

A policy network and a state have been loaded in your environment.

torch.distributions.Categorical has been imported as Categorical.

Cet exercice fait partie du cours

Deep Reinforcement Learning in Python

Afficher le cours

Instructions

  • Obtain the action probabilities as a torch tensor.
  • Obtain the torch Distribution corresponding to the action probabilities.
  • Sample an action from the distribution.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

def select_action(policy_network, state):
  # Obtain the action probabilities
  action_probs = ____
  print('Action probabilities:', action_probs)
  # Instantiate the action distribution
  action_dist = Categorical(____)
  # Sample an action from the distribution
  action = ____
  log_prob = action_dist.log_prob(action)
  return action.item(), log_prob.reshape(1)

state = torch.rand(8)
action, log_prob = select_action(policy_network, state)
print('Sampled action index:', action)
print(f'Log probability of sampled action: {log_prob.item():.2f}')
Modifier et exécuter le code