1
Introduction to Reinforcement Learning
Free
Dive into the exciting world of Reinforcement Learning (RL) by exploring its foundational concepts, roles, and applications. Navigate through the RL framework, uncovering the agent-environment interaction. You'll also learn how to use the Gymnasium library to create environments, visualize states, and perform actions, thus gaining a practical foundation in RL concepts and applications.
2
Model-Based Learning
Delve deeper into the world of RL focusing on model-based learning. Unravel the complexities of Markov Decision Processes (MDPs), understanding their essential components. Enhance your skill set by learning about policies and value functions. Gain expertise in policy optimization with policy iteration and value Iteration techniques.
3
Model-Free Learning
Embark on a journey through the dynamic realm of Model-Free Learning in RL. Get introduced to to the foundational Monte Carlo methods, and apply first-visit and every-visit Monte Carlo prediction algorithms. Transition into the world of Temporal Difference Learning, exploring the SARSA algorithm. Finally, dive into the depths of Q-Learning, and analyze its convergence in challenging environments.
4
Advanced Strategies in Model-Free RL
Dive into advanced strategies in Model-Free RL, focusing on enhancing decision-making algorithms. Learn about Expected SARSA for more accurate policy updates and Double Q-learning to mitigate overestimation bias. Explore the Exploration-Exploitation Tradeoff, mastering epsilon-greedy and epsilon-decay strategies for optimal action selection. Tackle the Multi-Armed Bandit Problem, applying strategies to solve decision-making challenges under uncertainty.

Initializing

Transition probabilities and rewards

The Cliff Walking environment has 48 states, numbered from 0 to 47, line by line, from the upper left corner (0) to the lower right corner (47). Your goal is to investigate the structure of transition probabilities and rewards within this setup. Notably, all rewards, including the reward for reaching the goal, are negative in this environment. This design choice emphasizes minimizing the number of steps taken, as each step incurs a penalty, making efficiency a key aspect for designing effective learning algorithms.

The gymnasium library has been imported as gym and the environment as env. Also num_states and num_actions from the previous exercise have been imported.

Image showing the cliff walking environment.

Choose the state located above the goal state.
For each action, extract the list of transition tuples for the chosen state and store it in transitions.
For each transition, extract the probability, next_state, reward, and done flag.