Calculating discounted returns for agent strategies
Discounted returns help in evaluating the total amount of rewards an agent can expect to accumulate over time, taking into account that future rewards are less valuable than immediate rewards. You are given the expected rewards for two different strategies (exp_rewards_strategy_1
and exp_rewards_strategy_2
) of an RL agent. Your task is to calculate the discounted return for each strategy and determine which one yields the higher return.
The numpy
library has been imported for you as np
.
Cet exercice fait partie du cours
Reinforcement Learning with Gymnasium in Python
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
exp_rewards_strategy_1 = np.array([3, 2, -1, 5])
discount_factor = 0.9
# Compute discounts
discounts_strategy_1 = np.array([____ for i in range(len(exp_rewards_strategy_1))])
# Compute the discounted return
discounted_return_strategy_1 = np.sum(____)
print(f"The discounted return of the first strategy is {discounted_return_strategy_1}")