CommencerCommencer gratuitement

Expected SARSA update rule

In this exercise, you'll implement the Expected SARSA update rule, a temporal difference model-free RL algorithm. Expected SARSA estimates the expected value of the current policy by averaging over all possible actions, providing a more stable update target compared to SARSA. The formulas used in Expected SARSA can be found below.

Image showing the mathematical formula of the expected SARSA update rule.

The numpy library has been imported as np.

Cet exercice fait partie du cours

Reinforcement Learning with Gymnasium in Python

Afficher le cours

Instructions

  • Calculate the expected Q-value for the next_state.
  • Update the Q-value for the current state and action using the Expected SARSA formula.
  • Update the Q-table Q supposing that an agent takes action 1 in state 2 and moves to state 3, receiving a reward of 5.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

def update_q_table(state, action, next_state, reward):
  	# Calculate the expected Q-value for the next state
    expected_q = ____
    # Update the Q-value for the current state and action
    Q[state, action] = ____
    
Q = np.random.rand(5, 2)
print("Old Q:\n", Q)
alpha = 0.1
gamma = 0.99

# Update the Q-table
update_q_table(____, ____, ____, ____)
print("Updated Q:\n", Q)
Modifier et exécuter le code