Navigating the RL framework
1. Navigating the RL framework
Building on our RL foundations, let's uncover how its core components interact and influence the agent's strategic decisions.2. RL framework
The RL framework consists of five key components: the agent,3. RL framework
environment,4. RL framework
states, actions, and rewards. The agent, acting as the learner or decision-maker, is like a player in a game. It interacts with the environment, which presents various challenges to be solved.5. RL framework
Within this environment, a state represents a specific moment in time, much like video game frame, capturing the current situation that the agent observes.6. RL framework
The agent's actions are responses to these states,7. RL framework
and rewards from the environment are feedback on these actions, either positive to encourage or negative to discourage certain behaviors.8. RL interaction loop
Let's demonstrate the agent-environment interaction using a generic code example, setting the stage for advanced scenarios we'll explore using gymnasium environments. The process starts by creating an environment and retrieving the initial state. The agent then enters a loop where it selects an action based on the current state in each iteration. After executing the action, the environment provides feedback in the form of a new state and a reward. Finally, the agent updates its knowledge based on the state, action, and reward it received.9. Episodic vs. continuous tasks
In RL, we encounter two types of tasks: episodic and continuous. Episodic tasks are divided into distinct episodes, each with a defined beginning and end. For example, in a chess game played by an agent, each game constitutes an episode. Once a game concludes, the environment resets for the next one. On the other hand, continuous tasks involve ongoing interaction without distinct episodes. A typical example is an agent continuously adjusting traffic lights in a city to optimize flow. In this course, we will primarily focus on episodic tasks, which are generally more common.10. Return
In RL, actions carry long-term consequences, impacting both immediate and future rewards. The agent's goal goes beyond maximizing immediate gains; it strives to accumulate the highest total reward over time. This leads us to a key concept in RL: the return. The return is the sum of all rewards the agent expects to accumulate throughout its journey. Accordingly, the agent learns to anticipate the sequence of actions that will yield the highest possible return.11. Discounted return
However, immediate rewards are typically valued more than future ones, leading to the concept of 'discounted return'. This concept prioritizes more recent rewards by multiplying each reward by a discount factor, gamma, raised to the power of its respective time step. For example, for expected rewards r1 through rn, the discounted return would be calculated as r1 + gamma * r2 + gamma^2 * r3, and so on.12. Discount factor
The discount factor gamma, ranging between 0 and 1, is crucial for balancing immediate and long-term rewards. A lower gamma value leads the agent to prioritize immediate gains, while a higher value emphasizes long-term benefits. At the extremes, a gamma of zero means the agent focuses solely on immediate rewards, while a gamma of one considers future rewards as equally important, applying no discount.13. Numerical example
In this example, we'll demonstrate how to calculate the discounted_return from an array of expected_rewards. We define a discount_factor of 0.9, then create an array of discounts, where each element corresponds to the discount factor raised to the power of the reward's position in the sequence. As we can see, discounts decrease over time, giving less importance to future rewards. Next, we multiply each reward by its corresponding discount and sum the results to compute the discounted_return, which is 8.83 in this example.14. Let's practice!
Now, let's practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.