DQN with prioritized experience replay

In this exercise, you will introduce Prioritized Experience Replay (PER) to improve the DQN algorithm. PER aims to optimize the batch of transitions selected to update the network at each step.

For reference, the method names you have declared for PrioritizedReplayBuffer are:

push() (to push transitions to the buffer)
sample() (to sample a batch of transitions from the buffer)
increase_beta() (to increase importance sampling)
update_priorities() (to update the sampled priorities)

The describe_episode() function is used again to describe each episode.

Instantiate a Prioritized Experience Replay buffer with a capacity of 10000 transitions.
Increase the influence of importance sampling over time by updating the beta parameter.
Update the priority of the sampled experiences based on their latest TD error.