CommencerCommencer gratuitement

Implementing double Q-learning update rule

Double Q-learning is an extension of the Q-learning algorithm that helps to reduce overestimation of action values by maintaining and updating two separate Q-tables. By decoupling the action selection from the action evaluation, Double Q-learning provides a more accurate estimation of the Q-values. This exercise guides you through implementing the Double Q-learning update rule. A list Q containing two Q-tables has been generated.

The numpy library has been imported as np, and gamma and alpha values have been pre-loaded. The update formulas are below:

Image showing the update rule of Q1.

Image showing the update rule of Q2.

Cet exercice fait partie du cours

Reinforcement Learning with Gymnasium in Python

Afficher le cours

Instructions

  • Randomly decide which Q-table within Q to update for the action value estimation by computing its index i.
  • Perform the necessary steps to update Q[i].

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

Q = [np.random.rand(8,4), np.random.rand(8,4)] 
def update_q_tables(state, action, reward, next_state):
  	# Get the index of the table to update
    i = ____
    # Update Q[i]
    best_next_action = ____
    Q[i][state, action] = ____
Modifier et exécuter le code