1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning with Gymnasium in Python

Connected

Exercise

Implementing the SARSA update rule

SARSA is an on-policy algorithm in RL that updates the action-value function based on the action taken and the action selected in the next state. This method helps to learn the value of not just the current state-action pair but also the subsequent one, providing a way to learn policies that consider future actions. The SARSA update rule is below, and your task is to implement a function that updates a Q-table based on this rule.

The NumPy library has been imported to you as np.

Image showing the mathematical formula of the SARSA update rule.

Instructions

100 XP
  • Retrieve the current Q-value for the given state-action pair.
  • Find the Q-value for the next state-action pair.
  • Update the Q-value for the current state-action pair using the SARSA formula.
  • Update the Q-table Q, given that an agent takes action 0 in state 0, receives a reward of 5, moves to state 1, and performs action 1.