1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning with Gymnasium in Python

Connected

Exercise

Implementing first-visit Monte Carlo

The goal of Monte Carlo algorithms is to estimate the Q-table in order to derive an optimal policy. In this exercise, you will implement the First-Visit Monte Carlo method to estimate the action-value function Q, and then compute the optimal policy to solve the custom environment you've seen in the previous exercise. Whenever computing the return, assume a discount factor of 1.

The numpy arrays Q, returns_sum, and returns_count, storing the Q-values, the cumulative sum of rewards, and the count of visits for each state-action pair, respectively, have been initialized and pre-loaded for you.

Instructions

100 XP
  • Define the if condition that should be tested in the first-visit Monte Carlo algorithm.
  • Update the returns (returns_sum), their counts (returns_count) and the visited_states.