Expected SARSA update rule
In this exercise, you'll implement the Expected SARSA update rule, a temporal difference model-free RL algorithm. Expected SARSA estimates the expected value of the current policy by averaging over all possible actions, providing a more stable update target compared to SARSA. The formulas used in Expected SARSA can be found below.

The numpy library has been imported as np.
Este exercício faz parte do curso
Reinforcement Learning with Gymnasium in Python
Instruções do exercício
- Calculate the expected Q-value for the
next_state. - Update the Q-value for the current
stateandactionusing the Expected SARSA formula. - Update the Q-table
Qsupposing that an agent takes action1in state2and moves to state3, receiving a reward of5.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
def update_q_table(state, action, next_state, reward):
# Calculate the expected Q-value for the next state
expected_q = ____
# Update the Q-value for the current state and action
Q[state, action] = ____
Q = np.random.rand(5, 2)
print("Old Q:\n", Q)
alpha = 0.1
gamma = 0.99
# Update the Q-table
update_q_table(____, ____, ____, ____)
print("Updated Q:\n", Q)