CommencerCommencer gratuitement

Applying Expected SARSA

Now you'll apply the Expected SARSA algorithm in a custom environment as shown below, where the goal is to let an agent navigate through a grid, aiming to reach a goal as quickly as possible. The same rules we had before apply: the agent receives a reward of +10 when reaching the diamond, -2 when passing through a mountain, and -1 for every other state.

new_cust_env.png

The environment has been imported as env.

Cet exercice fait partie du cours

Reinforcement Learning with Gymnasium in Python

Afficher le cours

Instructions

  • Initialize the Q-table Q with zeros for each state-action pair.
  • Update the Q-table using the update_q_table() function.
  • Extract the policy as a dictionary from the learned Q-table.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Initialize the Q-table with random values
Q = ____
for i_episode in range(num_episodes):
    state, info = env.reset()    
    done = False    
    while not done: 
        action = env.action_space.sample()               
        next_state, reward, done, truncated, info = env.step(action)
        # Update the Q-table
        ____
        state = next_state
# Derive policy from Q-table        
policy = {state: ____ for state in range(____)}
render_policy(policy)
Modifier et exécuter le code