DQN with experience replay
You will now introduce Experience Replay to train an agent using a Deep Q Network. You will use the same Lunar Lander environment as you did to build your Barebone DQN.
At every step, instead of using only the learnings from the most recent transition to update the network, the Experience Replay buffer enables the agent to learn from a random batch of recent experiences. This considerably improves its ability to learn about the environment.
The QNetwork
and ReplayBuffer
classes from previous exercises are available to you and have been instantiated as follows:
q_network = QNetwork(8, 4)
replay_buffer = ReplayBuffer(10000)
The describe_episode()
function is also again available to describe metrics at the end of each episode.
This exercise is part of the course
Deep Reinforcement Learning in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
for episode in range(10):
state, info = env.reset()
done = False
step = 0
episode_reward = 0
while not done:
step += 1
q_values = q_network(state)
action = torch.argmax(q_values).item()
next_state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
# Store the latest experience in the replay buffer
replay_buffer.____
state = next_state
episode_reward += reward
describe_episode(episode, reward, episode_reward, step)