BaşlayınÜcretsiz Başlayın

Expected SARSA update rule

In this exercise, you'll implement the Expected SARSA update rule, a temporal difference model-free RL algorithm. Expected SARSA estimates the expected value of the current policy by averaging over all possible actions, providing a more stable update target compared to SARSA. The formulas used in Expected SARSA can be found below.

Image showing the mathematical formula of the expected SARSA update rule.

The numpy library has been imported as np.

Bu egzersiz

Reinforcement Learning with Gymnasium in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Calculate the expected Q-value for the next_state.
  • Update the Q-value for the current state and action using the Expected SARSA formula.
  • Update the Q-table Q supposing that an agent takes action 1 in state 2 and moves to state 3, receiving a reward of 5.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

def update_q_table(state, action, next_state, reward):
  	# Calculate the expected Q-value for the next state
    expected_q = ____
    # Update the Q-value for the current state and action
    Q[state, action] = ____
    
Q = np.random.rand(5, 2)
print("Old Q:\n", Q)
alpha = 0.1
gamma = 0.99

# Update the Q-table
update_q_table(____, ____, ____, ____)
print("Updated Q:\n", Q)
Kodu Düzenle ve Çalıştır