Exercise

Computing Q-values

Your goal is to compute the action-values, also known as Q-values, for each state-action pair in the custom MyGridWorld environment when following the below policy. In RL, Q-values are essential because they represent the expected utility of executing a specific action in a given state, followed by adherence to the policy.

The environment has been imported as env along with the compute_state_value() function and the necessary variables needed (terminal_state, num_states, num_actions, policy, gamma).

Instructions

100 XP

Complete the compute_q_value() function to compute the action-value for a given state and action.
Create a dictionary Q where each key represents a state-action pair, and the corresponding value is the Q-value for that pair.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise