Exercise

Solving a multi-armed bandit

This exercise involves implementing an epsilon-greedy strategy to solve a 10-armed bandit problem, where the epsilon value decays over time to shift from exploration to exploitation.

epsilon, min_epsilon, and epsilon_decay have been pre-defined for you. The epsilon_greedy() function has been imported as well.

Instructions

100 XP

Use the create_multi_armed_bandit() function to initialize a 10-armed bandit problem, which will return true_bandit_probs, counts, values, rewards, and selected_arms.
Select an arm to pull using the epsilon_greedy() function.
Simulate the reward based on the true bandit probabilities.
Decay the epsilon value ensuring that it does not fall below the min_epsilon value.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise