Get startedGet started for free

Expected SARSA update rule

In this exercise, you'll implement the Expected SARSA update rule, a temporal difference model-free RL algorithm. Expected SARSA estimates the expected value of the current policy by averaging over all possible actions, providing a more stable update target compared to SARSA. The formulas used in Expected SARSA can be found below.

Image showing the mathematical formula of the expected SARSA update rule.

The numpy library has been imported as np.

This exercise is part of the course

Reinforcement Learning with Gymnasium in Python

View Course

Exercise instructions

  • Calculate the expected Q-value for the next_state.
  • Update the Q-value for the current state and action using the Expected SARSA formula.
  • Update the Q-table Q supposing that an agent takes action 1 in state 2 and moves to state 3, receiving a reward of 5.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

def update_q_table(state, action, next_state, reward):
  	# Calculate the expected Q-value for the next state
    expected_q = ____
    # Update the Q-value for the current state and action
    Q[state, action] = ____
    
Q = np.random.rand(5, 2)
print("Old Q:\n", Q)
alpha = 0.1
gamma = 0.99

# Update the Q-table
update_q_table(____, ____, ____, ____)
print("Updated Q:\n", Q)
Edit and Run Code