ComenzarEmpieza gratis

Transition probabilities and rewards

The Cliff Walking environment has 48 states, numbered from 0 to 47, line by line, from the upper left corner (0) to the lower right corner (47). Your goal is to investigate the structure of transition probabilities and rewards within this setup. Notably, all rewards, including the reward for reaching the goal, are negative in this environment. This design choice emphasizes minimizing the number of steps taken, as each step incurs a penalty, making efficiency a key aspect for designing effective learning algorithms.

The gymnasium library has been imported as gym and the environment as env. Also num_states and num_actions from the previous exercise have been imported.

Image showing the cliff walking environment.

Este ejercicio forma parte del curso

Reinforcement Learning with Gymnasium in Python

Ver curso

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Choose the state
state = ____

# Extract transitions for each state-action pair
for action in range(num_actions):
    transitions = ____
    # Print details of each transition
    for transition in transitions:
        ____, ____, ____, ____ = transition
        print(f"Probability: {probability}, Next State: {next_state}, Reward: {reward}, Done: {done}")
Editar y ejecutar código