Exercise

Improving a policy

In the previous exercise, you computed the Q-values for each state-action pair in the MyGridWorld environment. Now, you'll use these Q-values to improve the existing policy. Policy improvement is a critical step in reinforcement learning, where you enhance the policy by choosing actions that maximize the expected utility (Q-value) in each state. After improving the policy, you will render the new movements according to this improved policy.

The environment has been imported as env, along with the Q-values as Q, and the render() function.

Instructions

100 XP

Find the best action for each state based on Q-values.
Select the right action based on the improved_policy.
Execute the selected action to observe its outcome.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise