DQN with experience replay
You will now introduce Experience Replay to train an agent using a Deep Q Network. You will use the same Lunar Lander environment as you did to build your Barebone DQN.
At every step, instead of using only the learnings from the most recent transition to update the network, the Experience Replay buffer enables the agent to learn from a random batch of recent experiences. This considerably improves its ability to learn about the environment.
The QNetwork
and ReplayBuffer
classes from previous exercises are available to you and have been instantiated as follows:
q_network = QNetwork(8, 4)
replay_buffer = ReplayBuffer(10000)
The describe_episode()
function is also again available to describe metrics at the end of each episode.
Cet exercice fait partie du cours
Deep Reinforcement Learning in Python
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
for episode in range(10):
state, info = env.reset()
done = False
step = 0
episode_reward = 0
while not done:
step += 1
q_values = q_network(state)
action = torch.argmax(q_values).item()
next_state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
# Store the latest experience in the replay buffer
replay_buffer.____
state = next_state
episode_reward += reward
describe_episode(episode, reward, episode_reward, step)