Computing state-values for a policy
Using the same deterministic environment MyGridWorld, now you need to evaluate the effectiveness of the policy you defined in the previous exercise. You'll do this by computing the state value function for each state under this policy.
The environment has been imported as env along with the necessary variables needed (terminal_state, num_states, policy, gamma).
Bu egzersiz
Reinforcement Learning with Gymnasium in Python
kursunun bir parçasıdırEgzersiz talimatları
- Complete the function
compute_state_value()to compute the value for each state under the given policy. - Create a
state_valuesdictionary where each key is thestate, and each value is the state value.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Complete the function
def compute_state_value(state):
if state == terminal_state:
return ____
action = ____
_, next_state, reward, _ = env.unwrapped.P[state][action][0]
return ____
# Compute all state values
state_values = {____: ____ for ____ in range(____)}
print(state_values)