LoslegenKostenlos loslegen

Transition probabilities and rewards

The Cliff Walking environment has 48 states, numbered from 0 to 47, line by line, from the upper left corner (0) to the lower right corner (47). Your goal is to investigate the structure of transition probabilities and rewards within this setup. Notably, all rewards, including the reward for reaching the goal, are negative in this environment. This design choice emphasizes minimizing the number of steps taken, as each step incurs a penalty, making efficiency a key aspect for designing effective learning algorithms.

The gymnasium library has been imported as gym and the environment as env. Also num_states and num_actions from the previous exercise have been imported.

Image showing the cliff walking environment.

Diese Übung ist Teil des Kurses

Reinforcement Learning with Gymnasium in Python

Kurs anzeigen

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Choose the state
state = ____

# Extract transitions for each state-action pair
for action in range(num_actions):
    transitions = ____
    # Print details of each transition
    for transition in transitions:
        ____, ____, ____, ____ = transition
        print(f"Probability: {probability}, Next State: {next_state}, Reward: {reward}, Done: {done}")
Code bearbeiten und ausführen