ComenzarEmpieza gratis

Calculating discounted returns for agent strategies

Discounted returns help in evaluating the total amount of rewards an agent can expect to accumulate over time, taking into account that future rewards are less valuable than immediate rewards. You are given the expected rewards for two different strategies (exp_rewards_strategy_1 and exp_rewards_strategy_2) of an RL agent. Your task is to calculate the discounted return for each strategy and determine which one yields the higher return.

The numpy library has been imported for you as np.

Este ejercicio forma parte del curso

Reinforcement Learning with Gymnasium in Python

Ver curso

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

exp_rewards_strategy_1 = np.array([3, 2, -1, 5])

discount_factor = 0.9

# Compute discounts
discounts_strategy_1 = np.array([____ for i in range(len(exp_rewards_strategy_1))])

# Compute the discounted return
discounted_return_strategy_1 = np.sum(____)

print(f"The discounted return of the first strategy is {discounted_return_strategy_1}")
Editar y ejecutar código