Get startedGet started for free

Introduction to deep Q learning

1. Introduction to deep Q learning

Let's start our exploration of Deep Reinforcement Learning with Deep Q learning.

2. What is Deep Q Learning?

Deep Q-learning builds upon Q-learning. Both techniques revolve around the Action value function Q, associating a value to any combination of state S_t and action A_t.

3. Q-Learning refresher

Let's first take a short moment to revisit the key notions of Q-Learning. Note that the focus of this course is on hands-on implementation, so a deep theoretical understanding of the math is not required. Q_pi(s,a) denotes the Action Value function; it is the sum of future rewards if action a is taken in state s, assuming policy pi is always followed afterwards. If an agent had perfect knowledge of the Q function, it could use it to always select the action with highest value. This could then form the basis of an optimal policy. But, the optimal action value function is unknown at the start and the goal of Q-learning is to learn it over time.

4. Q-Learning refresher

Q can be written as a recursive formula called the Bellman Equation, expressing the Q-value in the current step in terms of the Q-value in the next step. In Q-learning and in a deterministic environment, the Bellman equation is Q equals next reward plus the Q-value corresponding to the action that maximizes the next state Q-value. To learn Q, after taking an action at each step, we calculate the right side of the Bellman equation (called equivalently TD-target, Q-target, or target Q-value) and use it to update our estimate for Q. In Q-learning, we used a specific update rule for this. In Deep Q-Learning, we will see in the next video that we use the TD target slightly differently. But first, let's look at how we approach Q-value estimation.

5. The Q-Network

In Q-learning, we used a table to learn the Q function. This works well as long as the state space is small. But this does not scale:

6. The Q-Network

as the state increases in size,

7. The Q-Network

so does the Q-table, and so does the dataset required to learn the Q function.

8. The Q-Network

Instead, at the heart of Deep Q Learning is a neural network.

9. The Q-Network

The network takes as input the state of the environment: what the agent observes at a given point in time.

10. The Q-Network

As its output layer, the network associates a value for each possible action. Over time, we want the network to learn the action value function Q. We call this a Q-Network. Q-Networks are commonly used in value-based methods such as the Deep Q Network algorithm or DQN. Using a neural network to approximate the value function enables agents to handle high-dimensional states, such as the entire display of a video game.

11. Implementing the Q-network

Let's examine what the Q-Network might look like. It maps the state to the action values. This determines the size of its input and output layers. These are the only constraints on the architecture of the network. We specify those layer sizes as arguments of the class constructor init. In this example, we arbitrarily define two fully connected hidden layers of 64 nodes each. We can use ReLU activation functions, as they often give good results. Finally, we instantiate the network and optimizer.

12. Let's practice!

Let's practice!