1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning with Gymnasium in Python

Connected

Exercise

Solving a multi-armed bandit

This exercise involves implementing an epsilon-greedy strategy to solve a 10-armed bandit problem, where the epsilon value decays over time to shift from exploration to exploitation.

epsilon, min_epsilon, and epsilon_decay have been pre-defined for you. The epsilon_greedy() function has been imported as well.

Instructions

100 XP
  • Use the create_multi_armed_bandit() function to initialize a 10-armed bandit problem, which will return true_bandit_probs, counts, values, rewards, and selected_arms.
  • Select an arm to pull using the epsilon_greedy() function.
  • Simulate the reward based on the true bandit probabilities.
  • Decay the epsilon value ensuring that it does not fall below the min_epsilon value.