ComeçarComece de graça

Episode generation for Monte Carlo methods

Monte Carlo methods require episodes to be generated in order to derive the value function. Therefore, you'll now implement a function that generates episodes by selecting actions randomly until an episode terminates. In later exercises, you will call this function to apply Monte Carlo methods on the custom environment env pre-loaded for you.

The render() function is pre-loaded for you.

Este exercício faz parte do curso

Reinforcement Learning with Gymnasium in Python

Ver curso

Instruções do exercício

  • Reset the environment using a seed of 42.
  • In the episode loop, select a random action at each iteration.
  • Once an iteration ends, update the episode data by adding the tuple (state, action, reward).

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

def generate_episode():
    episode = []
    # Reset the environment
    state, info = ____
    terminated = False
    while not terminated:
      # Select a random action
      action = ____
      next_state, reward, terminated, truncated, info = env.step(action)
      render()
      # Update episode data
      episode.____(____)
      state = next_state
    return episode
print(generate_episode())
Editar e executar o código