Get startedGet started for free

Implementing Q-learning update rule

Q-learning is an off-policy algorithm in reinforcement learning (RL) that seeks to find the best action to take given the current state. Unlike SARSA, which considers the actual next action taken, Q-learning updates its Q-values using the maximum future reward irrespective of the action taken. This distinction allows Q-learning to learn the optimal policy while following an exploratory or even a random policy. Here's the task to implement a function that updates a Q-table based on the Q-learning rule. The Q-learning update rule is below, and your task is to implement a function that updates a Q-table based on this rule.

The NumPy library has been imported to you as np.

Image showing the mathematical formula of the Q-learning update rule.

This exercise is part of the course

Reinforcement Learning with Gymnasium in Python

View Course

Exercise instructions

  • Retrieve the current Q-value for the given state-action pair.
  • Determine the maximum Q-value for the next state across all possible actions in actions.
  • Update the Q-value for the current state-action pair using the Q-learning formula.
  • Update the Q-table Q, given that an agent takes action 0 in state 0, receives a reward of 5, and moves to state 1.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

actions = ['action1', 'action2'] 
def update_q_table(state, action, reward, next_state):
  	# Get the old value of the current state-action pair
    old_value = ____
    # Determine the maximum Q-value for the next state
    next_max = ____
    # Compute the new value of the current state-action pair
    Q[state, action] = ____

alpha = 0.1
gamma = 0.95
Q = np.array([[10, 8], [20, 15]], dtype='float32')
# Update the Q-table
____
print(Q)
Edit and Run Code