Exercise

Implementing double Q-learning update rule

Double Q-learning is an extension of the Q-learning algorithm that helps to reduce overestimation of action values by maintaining and updating two separate Q-tables. By decoupling the action selection from the action evaluation, Double Q-learning provides a more accurate estimation of the Q-values. This exercise guides you through implementing the Double Q-learning update rule. A list Q containing two Q-tables has been generated.

The numpy library has been imported as np, and gamma and alpha values have been pre-loaded. The update formulas are below:

Image showing the update rule of Q1.

Image showing the update rule of Q2.

Instructions

100 XP

Randomly decide which Q-table within Q to update for the action value estimation by computing its index i.
Perform the necessary steps to update Q[i].

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise