Exercise

Applying Expected SARSA

Now you'll apply the Expected SARSA algorithm in a custom environment as shown below, where the goal is to let an agent navigate through a grid, aiming to reach a goal as quickly as possible. The same rules we had before apply: the agent receives a reward of +10 when reaching the diamond, -2 when passing through a mountain, and -1 for every other state.

The environment has been imported as env.

Instructions

100 XP

Initialize the Q-table Q with zeros for each state-action pair.
Update the Q-table using the update_q_table() function.
Extract the policy as a dictionary from the learned Q-table.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise