1
Introduction to Deep Reinforcement Learning
Free
Discover how deep reinforcement learning improves upon traditional Reinforcement Learning while studying and implementing your first Deep Q Learning algorithm.
2
Deep Q-learning
Dive into Deep Q-learning by implementing the original DQN algorithm, featuring Experience Replay, epsilon-greediness and fixed Q-targets. Beyond DQN, you will then explore two fascinating extensions that improve the performance and stability of Deep Q-learning: Double DQN and Prioritized Experience Replay.
3
Introduction to Policy Gradient Methods
Learn about the foundational concepts of policy gradient methods found in DRL. You will begin with the policy gradient theorem, which forms the basis for these methods. Then, you will implement the REINFORCE algorithm, a powerful approach to learning policies. The chapter will then guide you through Actor-Critic methods, focusing on the Advantage Actor-Critic (A2C) algorithm, which combines the strengths of both policy gradient and value-based methods to enhance learning efficiency and stability.
4
Proximal Policy Optimization and DRL Tips
Explore Proximal Policy Optimization (PPO) for robust DRL performance. Next, you will examine using an entropy bonus in PPO, which encourages exploration by preventing premature convergence to deterministic policies. You'll also learn about batch updates in policy gradient methods. Finally, you will learn about hyperparameter optimization with Optuna, a powerful tool for optimizing performance in your DRL models.

Initializing

DQN with prioritized experience replay

In this exercise, you will introduce Prioritized Experience Replay (PER) to improve the DQN algorithm. PER aims to optimize the batch of transitions selected to update the network at each step.

For reference, the method names you have declared for PrioritizedReplayBuffer are:

push() (to push transitions to the buffer)
sample() (to sample a batch of transitions from the buffer)
increase_beta() (to increase importance sampling)
update_priorities() (to update the sampled priorities)

The describe_episode() function is used again to describe each episode.

Instantiate a Prioritized Experience Replay buffer with a capacity of 10000 transitions.
Increase the influence of importance sampling over time by updating the beta parameter.
Update the priority of the sampled experiences based on their latest TD error.