Training the barebone DQN
It's time to train a Barebone DQN algorithm in the Lunar Lander environment. Keep in mind this is still a bare algorithm, so the performance won't be great, but you'll improve on it later.
Think of it as the first step towards getting your Lunar Lander to land on the Moon!
The q_network
instance that you defined earlier is available to you.
Throughout the exercises in this course, your Python environment also comes with a describe_episode()
function to print some information at the end of each episode about how the agent has fared.
Cet exercice fait partie du cours
Deep Reinforcement Learning in Python
Instructions
- Select the agent's action in the inner loop.
- Calculate the loss.
- Perform a gradient descent step to update the network weights.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
for episode in range(10):
state, info = env.reset()
done = False
step = 0
episode_reward = 0
while not done:
step += 1
# Select the action
action = ____(____, ____)
next_state, reward, terminated, truncated, _ = (env.step(action))
done = terminated or truncated
# Calculate the loss
loss = ____(q_network, state, action, next_state, reward, done)
optimizer.zero_grad()
# Perform a gradient descent step
loss.____
optimizer.____
state = next_state
episode_reward += reward
describe_episode(episode, reward, episode_reward, step)