Get startedGet started for free

Transition probabilities and rewards

The Cliff Walking environment has 48 states, numbered from 0 to 47, line by line, from the upper left corner (0) to the lower right corner (47). Your goal is to investigate the structure of transition probabilities and rewards within this setup. Notably, all rewards, including the reward for reaching the goal, are negative in this environment. This design choice emphasizes minimizing the number of steps taken, as each step incurs a penalty, making efficiency a key aspect for designing effective learning algorithms.

The gymnasium library has been imported as gym and the environment as env. Also num_states and num_actions from the previous exercise have been imported.

Image showing the cliff walking environment.

This exercise is part of the course

Reinforcement Learning with Gymnasium in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Choose the state
state = ____

# Extract transitions for each state-action pair
for action in range(num_actions):
    transitions = ____
    # Print details of each transition
    for transition in transitions:
        ____, ____, ____, ____ = transition
        print(f"Probability: {probability}, Next State: {next_state}, Reward: {reward}, Done: {done}")
Edit and Run Code