Get startedGet started for free

Calculating discounted returns for agent strategies

Discounted returns help in evaluating the total amount of rewards an agent can expect to accumulate over time, taking into account that future rewards are less valuable than immediate rewards. You are given the expected rewards for two different strategies (exp_rewards_strategy_1 and exp_rewards_strategy_2) of an RL agent. Your task is to calculate the discounted return for each strategy and determine which one yields the higher return.

The numpy library has been imported for you as np.

This exercise is part of the course

Reinforcement Learning with Gymnasium in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

exp_rewards_strategy_1 = np.array([3, 2, -1, 5])

discount_factor = 0.9

# Compute discounts
discounts_strategy_1 = np.array([____ for i in range(len(exp_rewards_strategy_1))])

# Compute the discounted return
discounted_return_strategy_1 = np.sum(____)

print(f"The discounted return of the first strategy is {discounted_return_strategy_1}")
Edit and Run Code