Get startedGet started for free

Introduction to deep reinforcement learning

1. Introduction to deep reinforcement learning

Hello, and welcome to this course on Deep Reinforcement Learning! I am excited be your instructor for this course. As Principal Machine Learning Engineer at Komment, I design algorithms to generate high quality code documentation.

2. Why Deep Reinforcement Learning

Traditional Reinforcement Learning shines in low-dimensional tasks such as the Frozen Lake Gymnasium environment. Many real-world applications, including video games and robotics, require high dimensional state and action spaces, where Reinforcement Learning struggles to keep up. That's where Deep Reinforcement Learning, or DRL, comes in.

3. The ingredients of DRL

To explore DRL, we need two key ingredients: a good grasp of Reinforcement Learning concepts, and familiarity with Deep Learning and the PyTorch library. DRL agents combine these ingredients, using deep neural networks as their learning engines. We will start by reviewing key Reinforcement Learning concepts.

4. The RL framework

In Reinforcement learning, an agent evolves in an environment over an episode. At every step t in the episode,

5. The RL framework

the agent observes state s_t from the environment.

6. The RL framework

Based on this, the agent takes an action, a_t.

7. The RL framework

The environment responds to this action by giving the agent a reward, r_t+1, and updating the state to s_t+1.

8. The RL framework

This process repeats until the episode is complete.

9. Policy $\pi(s_t)$

The policy of the agent is the mapping from a given state to the action that the agent will select. It can be deterministic or stochastic; in the latter case, the policy is a probability distribution over possible actions.

10. Trajectory and episode return

A trajectory tau is the sequence of all states and actions in an episode. The episode return R_tau is the discounted sum of rewards accumulated along the trajectory tau.

11. Setting up the environment

Let's explore a basic training loop for Deep Reinforcement Learning. First, we initiate an environment from the gymnasium package, for example, Space Invaders. Next, we define the neural network architecture, and instantiate it. Here, for simplicity we only use one linear module. Finally, we define an Adam optimizer for the network and specify its learning rate.

12. The basic loop

Most algorithms we will discuss will be variations of this training loop. We iterate through episodes in the outer loop, allowing the agent to repeatedly experience the environment. In each episode, an inner loop iterates through steps until the episode is complete. At each step, the agent chooses an action based on the state and the neural network and then observes the new state and reward. We then calculate a loss function and update the neural network by gradient descent. Finally, state equals next_state sets up the next step. We'll define the select_action and calculate_loss functions later in the course. In DRL, loss functions differ from those in supervised learning. There are no labels; instead, the agent creates its own training data by experiencing the environment. Loss is a tool constructed to obtain gradients that guide agents toward better policies.

13. Coming next

We'll explore this in detail when we dive deeper into DRL. You will learn that combining neural nets with reinforcement learning is incredibly powerful, allowing us to handle complex, high-dimensional problems. In this course, we will dive into value-based and policy-based approaches to DRL. This includes DQN and several of its refinements and extensions, and an introduction to policy gradient methods. We will start by introducing the fundamental building blocks of Deep Q Learning and training our first DRL agent.

14. Let's practice!

Let's dive into some practice!