Importanza delle variabili

Sai già che gli alberi bagging sono un modello ensemble che supera il problema della varianza dei decision tree. Ora hai imparato che l’algoritmo random forest migliora ulteriormente usando solo un sottoinsieme casuale delle caratteristiche in ciascun albero. Questo decorrela ancora di più l’ensemble, migliorandone le prestazioni predittive.

In questo esercizio, costruirai una random forest e traccerai l’importanza dei predittori usando il pacchetto vip. I dati di training, customers_train, sono già caricati nel tuo workspace.

Questo esercizio fa parte del corso

Machine Learning con modelli ad albero in R

Visualizza il corso

Istruzioni dell'esercizio

Crea spec, la specifica di un modello di classificazione random forest usando il motore "ranger" e l’importanza delle variabili basata su "impurity".
Crea model adattando il tibble customers_train a spec usando still_customer come variabile di esito e tutte le altre colonne come variabili predittive.
Rappresenta l’importanza delle variabili usando la funzione vip() del pacchetto vip (che non è pre-caricato).

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Specify a random forest
spec <- ___ %>%
	set_mode("classification") %>%
    set_engine(___, importance = ___)

# Train the forest
model <- spec %>%
    fit(___,
        ___)

# Plot the variable importance
vip::___(model)

Modifica ed esegui il codice

Questo esercizio fa parte del corso

Machine Learning con modelli ad albero in R

InicianteNível de habilidade

4.9+

Inizia il corso gratis

Ready to build a real machine learning pipeline? Complete step-by-step exercises to learn how to create decision trees, split your data, and predict which patients are most likely to suffer from diabetes. Last but not least, you’ll build performance measures to assess your models and judge your predictions.

Exercise 1: Welcome to the course!Exercise 2: Why tree-based methods?Exercise 3: Specify that tree Exercise 4: Train that model Exercise 5: How to grow your tree Exercise 6: Train/test split Exercise 7: Avoiding class imbalances Exercise 8: From zero to hero Exercise 9: Predict and evaluate Exercise 10: Make predictions Exercise 11: Crack the matrix Exercise 12: Are you predicting correctly?

Ready for some candy? Use a chocolate rating dataset to build regression trees and assess their performance using suitable error measures. You’ll overcome statistical insecurities of single train/test splits by applying sweet techniques like cross-validation and then dive even deeper by mastering the bias-variance tradeoff.

Exercise 1: Continuous outcomes Exercise 2: Train a regression tree Exercise 3: Predict new values Exercise 4: Inspect model output Exercise 5: Performance metrics for regression trees Exercise 6: In-sample performance Exercise 7: Out-of-sample performance Exercise 8: Bigger mistakes, bigger penalty Exercise 9: Cross-validation Exercise 10: Create the folds Exercise 11: Fit the folds Exercise 12: Evaluate the folds Exercise 13: Bias-variance tradeoff Exercise 14: Call things by their names Exercise 15: Adjust model complexity Exercise 16: In-sample and out-of-sample performance

Time to get serious with tuning your hyperparameters and interpreting receiver operating characteristic (ROC) curves. In this chapter, you’ll leverage the wisdom of the crowd with ensemble models like bagging or random forests and build ensembles that forecast which credit card customers are most likely to churn.

Exercise 1: Ottimizzazione degli iperparametri Exercise 2: Genera una griglia di tuning Exercise 3: Sintonizza lungo la griglia Exercise 4: Scegli il vincitore Exercise 5: Altre metriche del modello Exercise 6: Calcolare la specificità Exercise 7: Disegna la curva ROC Exercise 8: Area sotto la curva ROC Exercise 9: Alberi con bagging Exercise 10: Crea alberi bagged Exercise 11: ROC e AUC in-sample Exercise 12: Verifica l'overfitting Exercise 13: Random forest Exercise 14: Bagging vs. random forest Exercise 15: Importanza delle variabili

Esercizio in corso

Ready for the high society of tree-based models? Apply gradient boosting to create powerful ensembles that perform better than anything that you have seen or built. Learn about their fine-tuning and how to compare different models to pick a winner for production.

Exercise 1: Introduction to boosting Exercise 2: Bagging vs. boosting Exercise 3: Specify a boosted ensemble Exercise 4: Gradient boosting Exercise 5: Train a boosted ensemble Exercise 6: Evaluate the ensemble Exercise 7: Compare to a single classifier Exercise 8: Optimize the boosted ensemble Exercise 9: Tuning preparation Exercise 10: The actual tuning Exercise 11: Finalize the model Exercise 12: Model comparison Exercise 13: Compare AUC Exercise 14: Plot ROC curves Exercise 15: Wrap-up