ComenzarEmpieza gratis

Implementing double Q-learning update rule

Double Q-learning is an extension of the Q-learning algorithm that helps to reduce overestimation of action values by maintaining and updating two separate Q-tables. By decoupling the action selection from the action evaluation, Double Q-learning provides a more accurate estimation of the Q-values. This exercise guides you through implementing the Double Q-learning update rule. A list Q containing two Q-tables has been generated.

The numpy library has been imported as np, and gamma and alpha values have been pre-loaded. The update formulas are below:

Image showing the update rule of Q1.

Image showing the update rule of Q2.

Este ejercicio forma parte del curso

Reinforcement Learning with Gymnasium in Python

Ver curso

Instrucciones del ejercicio

  • Randomly decide which Q-table within Q to update for the action value estimation by computing its index i.
  • Perform the necessary steps to update Q[i].

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

Q = [np.random.rand(8,4), np.random.rand(8,4)] 
def update_q_tables(state, action, reward, next_state):
  	# Get the index of the table to update
    i = ____
    # Update Q[i]
    best_next_action = ____
    Q[i][state, action] = ____
Editar y ejecutar código