Optimize the boosted ensemble
1. Optimize the boosted ensemble
Welcome back! Now that you created and trained a boosted classifier using the built-in hyperparameters, it's time to alter these hyperparameters to maximize the performance. Let's tune the boosted ensemble!2. Starting point: untuned performance
As a starting point, the out-of-the-box performance observed in the exercises was 95%. That's already fantastic given just the standard hyperparameters were used.3. Tuning workflow
This overview shows the steps you learned for tuning a specification. First, use tune() to flag hyperparameters for tuning in your specification.4. Tuning workflow
Then, create a grid of hyperparameters with grid_regular() or others.5. Tuning workflow
Then, use vfold_cv() to create cross-validation folds.6. Tuning workflow
You pass all that into the tune_grid() function, and go for coffee or a jog.7. Tuning workflow
After you come back, call select_best() to select the best results.8. Tuning workflow
and finalize your model specification with the winners.9. Tuning workflow
As a last step, you fit your final model using the optimal hyperparameters to the training data to get your optimized full model.10. Step 1: Create the tuning spec
Let's get to coding! As a first step, create the model specification. Let's choose and fix 500 trees and optimize for learning rate, tree_depth, and sample_size. The console output reflects these decisions.11. Step 2: Create the tuning grid
Then, we need a grid containing all hyperparameter combinations that we want to try. You already know grid_regular(), which creates an evenly-spaced grid of all the hyperparameters. It takes the tuning parameters, which we extract by applying the function parameters() to our dummy specification, and the levels, which is the number of levels each tuning parameter should get. Let's specify two levels for each of our three tuning parameters. The result is a tibble with 8 rows, that is eight possible combinations of the three hyperparameters having two levels each. Another possibility is grid_random(), which creates a random, and not evenly spaced grid. The size parameter specifies the number of random combinations in the result. Size equals 8 gives us 8 random combinations of values.12. Step 3: The tuning
Now for the actual tuning. The tune_grid() function takes the dummy specification, the model formula, the resamples, which are some cross-validation folds, a tuning grid, and a list of metrics. In our case, the dummy specification is boost_spec, the model formula is "still_customer is modeled as a function of all other parameters", resamples is six folds of the training data customers_train, the tuning grid is tunegrid_boost, which we created in the previous slide, and metrics is a metric_set containing only the roc_auc metric.13. Visualize the result
It's always helpful and interesting to visualize the tuning results. The autoplot() function creates an overview of the tuning results. In our case, we see one plot per sample_size, tree_depth on the x axis, the AUC on the y axis, and different colors for different learning rates. The green line containing the smallest learning rate achieves only an area under curve of 50%, and there seems to be not much difference between a tree_depth of 8 or 12, both have an AUC value of 95 to close to 100%.14. Step 4: Finalize the model
The optimal hyperparameter combination can be extracted using select_best(). This gives you a one-row tibble containing one column for every hyperparameter. We see that Model17 with a tree_depth of 8, a learn_rate of 0-point-1 and a sample_size of 55% yields the best results. Then, plug these into the specification containing the placeholders using finalize_model(). This finalizes your specification after tuning.15. Last step: Train the final model
Finally, you train the final model on the whole training set customers_train. Printing the model reveals information like that it took 2-point-3 seconds to train and is 344 kilobytes in size.16. Your turn!
Now it's your turn to apply that to your boosted ensemble!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.