1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning with Gymnasium in Python

Connected

Exercise

Computing Q-values

Your goal is to compute the action-values, also known as Q-values, for each state-action pair in the custom MyGridWorld environment when following the below policy. In RL, Q-values are essential because they represent the expected utility of executing a specific action in a given state, followed by adherence to the policy.

exercise_policy.png

The environment has been imported as env along with the compute_state_value() function and the necessary variables needed (terminal_state, num_states, num_actions, policy, gamma).

Instructions

100 XP
  • Complete the compute_q_value() function to compute the action-value for a given state and action.
  • Create a dictionary Q where each key represents a state-action pair, and the corresponding value is the Q-value for that pair.