ComenzarEmpieza gratis

Computing state-values for a policy

Using the same deterministic environment MyGridWorld, now you need to evaluate the effectiveness of the policy you defined in the previous exercise. You'll do this by computing the state value function for each state under this policy.

The environment has been imported as env along with the necessary variables needed (terminal_state, num_states, policy, gamma).

Este ejercicio forma parte del curso

Reinforcement Learning with Gymnasium in Python

Ver curso

Instrucciones del ejercicio

  • Complete the function compute_state_value() to compute the value for each state under the given policy.
  • Create a state_values dictionary where each key is the state, and each value is the state value.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Complete the function
def compute_state_value(state):
    if state == terminal_state:
        return ____
    action = ____
    _, next_state, reward, _ = env.unwrapped.P[state][action][0]
    return ____

# Compute all state values 
state_values = {____: ____ for ____ in range(____)}

print(state_values)
Editar y ejecutar código