Episode generation for Monte Carlo methods
Monte Carlo methods require episodes to be generated in order to derive the value function. Therefore, you'll now implement a function that generates episodes by selecting actions randomly until an episode terminates. In later exercises, you will call this function to apply Monte Carlo methods on the custom environment env
pre-loaded for you.
The render()
function is pre-loaded for you.
Cet exercice fait partie du cours
Reinforcement Learning with Gymnasium in Python
Instructions
- Reset the environment using a
seed
of 42. - In the episode loop, select a random
action
at each iteration. - Once an iteration ends, update the
episode
data by adding the tuple(state, action, reward)
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
def generate_episode():
episode = []
# Reset the environment
state, info = ____
terminated = False
while not terminated:
# Select a random action
action = ____
next_state, reward, terminated, truncated, info = env.step(action)
render()
# Update episode data
episode.____(____)
state = next_state
return episode
print(generate_episode())