Implementing the SARSA update rule

SARSA is an on-policy algorithm in RL that updates the action-value function based on the action taken and the action selected in the next state. This method helps to learn the value of not just the current state-action pair but also the subsequent one, providing a way to learn policies that consider future actions. The SARSA update rule is below, and your task is to implement a function that updates a Q-table based on this rule.

The NumPy library has been imported to you as np.

Image showing the mathematical formula of the SARSA update rule.

Retrieve the current Q-value for the given state-action pair.
Find the Q-value for the next state-action pair.
Update the Q-value for the current state-action pair using the SARSA formula.
Update the Q-table Q, given that an agent takes action 0 in state 0, receives a reward of 5, moves to state 1, and performs action 1.

Exercise

Implementing the SARSA update rule

Instructions

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise