Het echte afstemmen

De beste hyperparameters leveren het beste model voor je gegevens. Zodra je een tuningraster hebt gekozen, moet je voor elk rasterpunt een model trainen en evalueren om te zien welk punt de beste modelprestatie geeft.

Dit kan even duren: met k-fold cross-validation, een ensemblegrootte van n bomen en een tuningraster met t combinaties moeten er in totaal k * n * t modellen worden getraind.

Jij bent aan de beurt om het echte afstemwerk te doen! Vooraf geladen zijn customers_train en de resultaten van de vorige oefening, boost_spec en tunegrid_boost:

# A tibble: 27 x 3
   tree_depth    learn_rate  sample_size
        <int>         <dbl>        <dbl>
 1          1  0.0000000001         0.1 
 2          8  0.0000000001         0.1 
 3         15  0.0000000001         0.1 
 4          1  0.00000316           0.1 
 ...

Deze oefening maakt deel uit van de cursus

Machine Learning met boomgebaseerde modellen in R

Cursus bekijken

Oefeninstructies

Maak zes folds van de trainingsdata met vfold_cv() en sla ze op als folds.
Gebruik tune_grid() om boost_spec af te stemmen met je folds, je tuningraster en de roc_auc-metric. Sla de resultaten op als tune_results.
Plot de resultaten om het afstemmingsproces te visualiseren.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create CV folds of training data
folds <- ___

# Tune along the grid
tune_results <- ___(___,
                    still_customer ~ .,
                    resamples = ___,
                    grid = ___,
                    metrics = metric_set(___))

# Plot the results
___(___)

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Machine Learning met boomgebaseerde modellen in R

SkillTag.level.beginnerSkillTag.label

4.9+

Begin de cursus gratis

Ready to build a real machine learning pipeline? Complete step-by-step exercises to learn how to create decision trees, split your data, and predict which patients are most likely to suffer from diabetes. Last but not least, you’ll build performance measures to assess your models and judge your predictions.

Exercise 1: Welcome to the course!Exercise 2: Why tree-based methods?Exercise 3: Specify that tree Exercise 4: Train that model Exercise 5: How to grow your tree Exercise 6: Train/test split Exercise 7: Avoiding class imbalances Exercise 8: From zero to hero Exercise 9: Predict and evaluate Exercise 10: Make predictions Exercise 11: Crack the matrix Exercise 12: Are you predicting correctly?

Ready for some candy? Use a chocolate rating dataset to build regression trees and assess their performance using suitable error measures. You’ll overcome statistical insecurities of single train/test splits by applying sweet techniques like cross-validation and then dive even deeper by mastering the bias-variance tradeoff.

Exercise 1: Continuous outcomes Exercise 2: Train a regression tree Exercise 3: Predict new values Exercise 4: Inspect model output Exercise 5: Performance metrics for regression trees Exercise 6: In-sample performance Exercise 7: Out-of-sample performance Exercise 8: Bigger mistakes, bigger penalty Exercise 9: Cross-validation Exercise 10: Create the folds Exercise 11: Fit the folds Exercise 12: Evaluate the folds Exercise 13: Bias-variance tradeoff Exercise 14: Call things by their names Exercise 15: Adjust model complexity Exercise 16: In-sample and out-of-sample performance

Time to get serious with tuning your hyperparameters and interpreting receiver operating characteristic (ROC) curves. In this chapter, you’ll leverage the wisdom of the crowd with ensemble models like bagging or random forests and build ensembles that forecast which credit card customers are most likely to churn.

Exercise 1: Tuning hyperparameters Exercise 2: Generate a tuning grid Exercise 3: Tune along the grid Exercise 4: Pick the winner Exercise 5: More model measures Exercise 6: Calculate specificity Exercise 7: Draw the ROC curve Exercise 8: Area under the ROC curve Exercise 9: Bagged trees Exercise 10: Create bagged trees Exercise 11: In-sample ROC and AUC Exercise 12: Check for overfitting Exercise 13: Random forest Exercise 14: Bagged trees vs. random forest Exercise 15: Variable importance

Ready for the high society of tree-based models? Apply gradient boosting to create powerful ensembles that perform better than anything that you have seen or built. Learn about their fine-tuning and how to compare different models to pick a winner for production.

Exercise 1: Introductie tot boosting Exercise 2: Bagging vs. boosting Exercise 3: Specificeer een boosted ensemble Exercise 4: Gradient boosting Exercise 5: Train een boosted ensemble Exercise 6: Evalueer het ensemble Exercise 7: Vergelijk met één enkele classifier Exercise 8: Het boosted ensemble optimaliseren Exercise 9: Voorbereiden op afstemmen Exercise 10: Het echte afstemmen

Huidige oefening

Exercise 11: Rond het model af Exercise 12: Modelvergelijking Exercise 13: Vergelijk AUC Exercise 14: ROC-curves plotten Exercise 15: Afronding