Get startedGet started for free

Comparing policies

You are given two state value functions (value_function_1 and value_function_2) corresponding to two different policies in the MyGridWorld environment. Your task is to compare these state value functions on a state-by-state basis to determine which policy is more effective.

The variable num_states is available for you to use.

This exercise is part of the course

Reinforcement Learning with Gymnasium in Python

View Course

Exercise instructions

  • Create a list one_is_better of boolean values, where each element checks if the state's value in value_function_1 is higher or equal than the state's value in value_function_2.
  • Create a list two_is_better of boolean values, where each element checks if the state's value in value_function_2 is higher or equal than the state's value in value_function_1.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

value_function_1 = {0: 1, 1: 2, 2: 3, 3: 7, 4: 6, 5: 4, 6: 8, 7: 10, 8: 0}
value_function_2 = {0: 7, 1: 8, 2: 9, 3: 7, 4: 9, 5: 10, 6: 8, 7: 10, 8: 0}

# Check for each value in policy 1 if it is better than policy 2
one_is_better = [____ >= ____ for state in range(num_states)]

# Check for each value in policy 2 if it is better than policy 1
two_is_better = [____ >= ____ for state in range(num_states)]

if all(one_is_better):
  print("Policy 1 is better.")
elif all(two_is_better):
  print("Policy 2 is better.")
else:
  print("Neither policy is uniformly better across all states.")
Edit and Run Code