Aan de slagGa gratis aan de slag

Training the barebone DQN

It's time to train a Barebone DQN algorithm in the Lunar Lander environment. Keep in mind this is still a bare algorithm, so the performance won't be great, but you'll improve on it later.

Think of it as the first step towards getting your Lunar Lander to land on the Moon!

The q_network instance that you defined earlier is available to you.

Throughout the exercises in this course, your Python environment also comes with a describe_episode() function to print some information at the end of each episode about how the agent has fared.

Deze oefening maakt deel uit van de cursus

Deep Reinforcement Learning in Python

Cursus bekijken

Oefeninstructies

  • Select the agent's action in the inner loop.
  • Calculate the loss.
  • Perform a gradient descent step to update the network weights.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

for episode in range(10):
    state, info = env.reset()
    done = False
    step = 0
    episode_reward = 0
    while not done:
        step += 1     
        # Select the action
        action = ____(____, ____)
        next_state, reward, terminated, truncated, _ = (env.step(action))
        done = terminated or truncated
        # Calculate the loss
        loss = ____(q_network, state, action, next_state, reward, done)
        optimizer.zero_grad()
        # Perform a gradient descent step
        loss.____
        optimizer.____
        state = next_state
        episode_reward += reward
    describe_episode(episode, reward, episode_reward, step)
Code bewerken en uitvoeren