Belang van variabelen

Je weet al dat bagged trees een ensemblemodel zijn dat het variantieprobleem van beslisbomen oplost. Nu heb je geleerd dat het random forest-algoritme dit verder verbetert door in elke boom slechts een willekeurige subset van de features te gebruiken. Dit decorreleert het ensemble nog meer en verbetert de voorspellende prestaties.

In deze oefening bouw je zelf een random forest en visualiseer je het belang van de voorspellers met het vip-pakket. De trainingsdata, customers_train, is al voor je geladen in je werkruimte.

Deze oefening maakt deel uit van de cursus

Machine Learning met boomgebaseerde modellen in R

Cursus bekijken

Oefeninstructies

Maak spec, de specificatie van een random forest-classificatiemodel met de "ranger"-engine en variabele-importantie op basis van "impurity".
Maak model door de tibble customers_train te fitten aan spec, met still_customer als uitkomst en alle andere kolommen als voorspellers.
Plot de variabele-importantie met de functie vip() uit het vip-pakket (dit is niet vooraf geladen).

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Specify a random forest
spec <- ___ %>%
	set_mode("classification") %>%
    set_engine(___, importance = ___)

# Train the forest
model <- spec %>%
    fit(___,
        ___)

# Plot the variable importance
vip::___(model)

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Machine Learning met boomgebaseerde modellen in R

SkillTag.level.beginnerSkillTag.label

4.9+

Begin de cursus gratis

Ready to build a real machine learning pipeline? Complete step-by-step exercises to learn how to create decision trees, split your data, and predict which patients are most likely to suffer from diabetes. Last but not least, you’ll build performance measures to assess your models and judge your predictions.

Exercise 1: Welcome to the course!Exercise 2: Why tree-based methods?Exercise 3: Specify that tree Exercise 4: Train that model Exercise 5: How to grow your tree Exercise 6: Train/test split Exercise 7: Avoiding class imbalances Exercise 8: From zero to hero Exercise 9: Predict and evaluate Exercise 10: Make predictions Exercise 11: Crack the matrix Exercise 12: Are you predicting correctly?

Ready for some candy? Use a chocolate rating dataset to build regression trees and assess their performance using suitable error measures. You’ll overcome statistical insecurities of single train/test splits by applying sweet techniques like cross-validation and then dive even deeper by mastering the bias-variance tradeoff.

Exercise 1: Continuous outcomes Exercise 2: Train a regression tree Exercise 3: Predict new values Exercise 4: Inspect model output Exercise 5: Performance metrics for regression trees Exercise 6: In-sample performance Exercise 7: Out-of-sample performance Exercise 8: Bigger mistakes, bigger penalty Exercise 9: Cross-validation Exercise 10: Create the folds Exercise 11: Fit the folds Exercise 12: Evaluate the folds Exercise 13: Bias-variance tradeoff Exercise 14: Call things by their names Exercise 15: Adjust model complexity Exercise 16: In-sample and out-of-sample performance

Time to get serious with tuning your hyperparameters and interpreting receiver operating characteristic (ROC) curves. In this chapter, you’ll leverage the wisdom of the crowd with ensemble models like bagging or random forests and build ensembles that forecast which credit card customers are most likely to churn.

Exercise 1: Hyperparameters afstemmen Exercise 2: Genereer een tuningraster Exercise 3: Afstemmen langs het raster Exercise 4: Kies de winnaar Exercise 5: Meer modelmaten Exercise 6: Specificity berekenen Exercise 7: Teken de ROC-curve Exercise 8: Oppervlakte onder de ROC-curve Exercise 9: Gebagde bomen Exercise 10: Bagged trees maken Exercise 11: In-sample ROC en AUC Exercise 12: Controleer op overfitting Exercise 13: Random forest Exercise 14: Bagged trees vs. random forest Exercise 15: Belang van variabelen

Huidige oefening

Ready for the high society of tree-based models? Apply gradient boosting to create powerful ensembles that perform better than anything that you have seen or built. Learn about their fine-tuning and how to compare different models to pick a winner for production.

Exercise 1: Introduction to boosting Exercise 2: Bagging vs. boosting Exercise 3: Specify a boosted ensemble Exercise 4: Gradient boosting Exercise 5: Train a boosted ensemble Exercise 6: Evaluate the ensemble Exercise 7: Compare to a single classifier Exercise 8: Optimize the boosted ensemble Exercise 9: Tuning preparation Exercise 10: The actual tuning Exercise 11: Finalize the model Exercise 12: Model comparison Exercise 13: Compare AUC Exercise 14: Plot ROC curves Exercise 15: Wrap-up