In-sample performance

It's very important to know whether your regression model is useful or not. A useful model can be one that captures the structure of your training set well. One way to assess this in-sample performance is to predict on training data and calculate the mean absolute error of all predicted data points.

In this exercise, you will evaluate your in-sample predictions using MAE (mean absolute error). MAE tells you approximately how far away the predictions are from the true values.

It is calculated using the following formula, where \(n\) is the number of predictions made:

$$MAE = \frac{1}{n} \cdot \sum_{i=1}^n \text{absolute value of the }i\text{th error}$$

Available in your workspace is your model, the regression tree that you built in the last exercises.

This exercise is part of the course

Machine Learning with Tree-Based Models in R

Exercise instructions

Create in_sample_predictions by using model to predict on the chocolate_train tibble.
Calculate a vector abs_diffs that contains the absolute differences between the in-sample-predictions and the true grades.
Calculate the mean absolute error according to the formula above.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Predict using the training set
in_sample_predictions <- predict(model,
                                 ___)

# Calculate the vector of absolute differences
abs_diffs <- ___(__$___ - ___$___)

# Calculate the mean absolute error
1 / ___ * ___

Edit and Run Code

This exercise is part of the course

Machine Learning with Tree-Based Models in R

BeginnerSkill Level

4.9+

Start Course for Free

Ready to build a real machine learning pipeline? Complete step-by-step exercises to learn how to create decision trees, split your data, and predict which patients are most likely to suffer from diabetes. Last but not least, you’ll build performance measures to assess your models and judge your predictions.

Exercise 1: Welcome to the course!Exercise 2: Why tree-based methods?Exercise 3: Specify that tree Exercise 4: Train that model Exercise 5: How to grow your tree Exercise 6: Train/test split Exercise 7: Avoiding class imbalances Exercise 8: From zero to hero Exercise 9: Predict and evaluate Exercise 10: Make predictions Exercise 11: Crack the matrix Exercise 12: Are you predicting correctly?

Ready for some candy? Use a chocolate rating dataset to build regression trees and assess their performance using suitable error measures. You’ll overcome statistical insecurities of single train/test splits by applying sweet techniques like cross-validation and then dive even deeper by mastering the bias-variance tradeoff.

Exercise 1: Continuous outcomes Exercise 2: Train a regression tree Exercise 3: Predict new values Exercise 4: Inspect model output Exercise 5: Performance metrics for regression trees Exercise 6: In-sample performance

Current Exercise

Exercise 7: Out-of-sample performance Exercise 8: Bigger mistakes, bigger penalty Exercise 9: Cross-validation Exercise 10: Create the folds Exercise 11: Fit the folds Exercise 12: Evaluate the folds Exercise 13: Bias-variance tradeoff Exercise 14: Call things by their names Exercise 15: Adjust model complexity Exercise 16: In-sample and out-of-sample performance

Time to get serious with tuning your hyperparameters and interpreting receiver operating characteristic (ROC) curves. In this chapter, you’ll leverage the wisdom of the crowd with ensemble models like bagging or random forests and build ensembles that forecast which credit card customers are most likely to churn.

Exercise 1: Tuning hyperparameters Exercise 2: Generate a tuning grid Exercise 3: Tune along the grid Exercise 4: Pick the winner Exercise 5: More model measures Exercise 6: Calculate specificity Exercise 7: Draw the ROC curve Exercise 8: Area under the ROC curve Exercise 9: Bagged trees Exercise 10: Create bagged trees Exercise 11: In-sample ROC and AUC Exercise 12: Check for overfitting Exercise 13: Random forest Exercise 14: Bagged trees vs. random forest Exercise 15: Variable importance

Ready for the high society of tree-based models? Apply gradient boosting to create powerful ensembles that perform better than anything that you have seen or built. Learn about their fine-tuning and how to compare different models to pick a winner for production.

Exercise 1: Introduction to boosting Exercise 2: Bagging vs. boosting Exercise 3: Specify a boosted ensemble Exercise 4: Gradient boosting Exercise 5: Train a boosted ensemble Exercise 6: Evaluate the ensemble Exercise 7: Compare to a single classifier Exercise 8: Optimize the boosted ensemble Exercise 9: Tuning preparation Exercise 10: The actual tuning Exercise 11: Finalize the model Exercise 12: Model comparison Exercise 13: Compare AUC Exercise 14: Plot ROC curves Exercise 15: Wrap-up