DRL training loop
To allow the agent to experience the environment repeatedly, you need to set up a training loop.
Many DRL algorithms have in common this core structure:
- Loop through episodes
- Loop through steps within each episode
- At each step, choose an action, calculate the loss, and update the network
You are provided with placeholder select_action()
and calculate_loss()
functions that allow the code to run. The Network
and optimizer
defined from the previous exercise are also available to you.
Cet exercice fait partie du cours
Deep Reinforcement Learning in Python
Instructions
- Ensure that the outer loop (over episodes) runs for ten episodes.
- Ensure that the inner loop (over steps) runs until the episode is complete.
- Take the action selected by
select_action()
in theenv
environment. - At the end of an inner loop iteration, update the state before starting the next step.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
env = gym.make("LunarLander-v2")
# Run ten episodes
for episode in ____:
state, info = env.reset()
done = False
# Run through steps until done
while ____:
action = select_action(network, state)
# Take the action
next_state, reward, terminated, truncated, _ = ____
done = terminated or truncated
loss = calculate_loss(network, state, action, next_state, reward, done)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Update the state
state = ____
print(f"Episode {episode} complete.")