BaşlayınÜcretsiz Başlayın

Critic network

Actor Critic methods require two very different neural networks.

The architecture for the actor network is identical to that of the policy network you used for REINFORCE, so you can reuse the PolicyNetwork class.

However, the critic network is something you haven't implemented so far. The critic aims to approximate the state value function \(V(s_t)\), rather than the action value function \(Q(s_t, a_t)\) approximated by Q-Networks.

You will now implement the Critic network module which you will use in A2C.

Bu egzersiz

Deep Reinforcement Learning in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Fill in the desired dimension for the second fully connected layer so that it outputs one state value.
  • Obtain the value returned by the forward pass through the critic network.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

class Critic(nn.Module):
    def __init__(self, state_size):
        super(Critic, self).__init__()
        self.fc1 = nn.Linear(state_size, 64)
        # Fill in the desired dimensions
        self.fc2 = nn.Linear(____)

    def forward(self, state):
        x = torch.relu(self.fc1(torch.tensor(state)))
        # Calculate the output value
        value = ____
        return value

critic_network = Critic(8)
state_value = critic_network(torch.rand(8))
print('State value:', state_value)
Kodu Düzenle ve Çalıştır