Transition probabilities and rewards
The Cliff Walking environment has 48 states, numbered from 0 to 47, line by line, from the upper left corner (0) to the lower right corner (47). Your goal is to investigate the structure of transition probabilities and rewards within this setup. Notably, all rewards, including the reward for reaching the goal, are negative in this environment. This design choice emphasizes minimizing the number of steps taken, as each step incurs a penalty, making efficiency a key aspect for designing effective learning algorithms.
The gymnasium library has been imported as gym
and the environment as env
. Also num_states
and num_actions
from the previous exercise have been imported.
Cet exercice fait partie du cours
Reinforcement Learning with Gymnasium in Python
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Choose the state
state = ____
# Extract transitions for each state-action pair
for action in range(num_actions):
transitions = ____
# Print details of each transition
for transition in transitions:
____, ____, ____, ____ = transition
print(f"Probability: {probability}, Next State: {next_state}, Reward: {reward}, Done: {done}")