Episode generation for Monte Carlo methods
Monte Carlo methods require episodes to be generated in order to derive the value function. Therefore, you'll now implement a function that generates episodes by selecting actions randomly until an episode terminates. In later exercises, you will call this function to apply Monte Carlo methods on the custom environment env
pre-loaded for you.
The render()
function is pre-loaded for you.
This exercise is part of the course
Reinforcement Learning with Gymnasium in Python
Exercise instructions
- Reset the environment using a
seed
of 42. - In the episode loop, select a random
action
at each iteration. - Once an iteration ends, update the
episode
data by adding the tuple(state, action, reward)
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def generate_episode():
episode = []
# Reset the environment
state, info = ____
terminated = False
while not terminated:
# Select a random action
action = ____
next_state, reward, terminated, truncated, info = env.step(action)
render()
# Update episode data
episode.____(____)
state = next_state
return episode
print(generate_episode())