Transition probabilities and rewards

The Cliff Walking environment has 48 states, numbered from 0 to 47, line by line, from the upper left corner (0) to the lower right corner (47). Your goal is to investigate the structure of transition probabilities and rewards within this setup. Notably, all rewards, including the reward for reaching the goal, are negative in this environment. This design choice emphasizes minimizing the number of steps taken, as each step incurs a penalty, making efficiency a key aspect for designing effective learning algorithms.

The gymnasium library has been imported as gym and the environment as env. Also num_states and num_actions from the previous exercise have been imported.

Image showing the cliff walking environment.

Choose the state located above the goal state.
For each action, extract the list of transition tuples for the chosen state and store it in transitions.
For each transition, extract the probability, next_state, reward, and done flag.

Exercise

Transition probabilities and rewards

Instructions 1/2

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions 1/2

Exercise