BaşlayınÜcretsiz Başlayın

DRL training loop

To allow the agent to experience the environment repeatedly, you need to set up a training loop.

Many DRL algorithms have in common this core structure:

  1. Loop through episodes
  2. Loop through steps within each episode
  3. At each step, choose an action, calculate the loss, and update the network

You are provided with placeholder select_action() and calculate_loss() functions that allow the code to run. The Network and optimizer defined from the previous exercise are also available to you.

Bu egzersiz

Deep Reinforcement Learning in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Ensure that the outer loop (over episodes) runs for ten episodes.
  • Ensure that the inner loop (over steps) runs until the episode is complete.
  • Take the action selected by select_action() in the env environment.
  • At the end of an inner loop iteration, update the state before starting the next step.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

env = gym.make("LunarLander-v2")
# Run ten episodes
for episode in ____:
    state, info = env.reset()
    done = False    
    # Run through steps until done
    while ____:
        action = select_action(network, state)        
        # Take the action
        next_state, reward, terminated, truncated, _ = ____
        done = terminated or truncated        
        loss = calculate_loss(network, state, action, next_state, reward, done)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()        
        # Update the state
        state = ____
    print(f"Episode {episode} complete.")
Kodu Düzenle ve Çalıştır