Mean squared error

Let's focus on the 2017 NBA predictions again. Every year, there are at least a couple of NBA teams that win way more games than expected. If you use the MAE, this accuracy metric does not reflect the bad predictions as much as if you use the MSE. Squaring the large errors from bad predictions will make the accuracy look worse.

In this example, NBA executives want to better predict team wins. You will use the mean squared error to calculate the prediction error. The actual wins are loaded as y_test and the predictions as predictions.

This exercise is part of the course

Model Validation in Python

Exercise instructions

Manually calculate the MSE. $$ MSE = \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i ) ^2 }{n} $$
Calculate the MSE using sklearn.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

from sklearn.metrics import ____

n = ___(predictions)
# Finish the manual calculation of the MSE
mse_one = sum((y_test - predictions)____) / n
print('With a manual calculation, the error is {}'.format(mse_one))

# Use the scikit-learn function to calculate MSE
mse_two = ____
print('Using scikit-learn, the error is {}'.format(mse_two))

Edit and Run Code

This exercise is part of the course

Model Validation in Python

IntermediateSkill Level

4.8+

Start Course for Free

Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.

Exercise 1: Introduction to model validation Exercise 2: Modeling steps Exercise 3: Seen vs. unseen data Exercise 4: Regression models Exercise 5: Set parameters and fit a model Exercise 6: Feature importances Exercise 7: Classification models Exercise 8: Classification predictions Exercise 9: Reusing model parameters Exercise 10: Random forest classifier

This chapter focuses on the basics of model validation. From splitting data into training, validation, and testing datasets, to creating an understanding of the bias-variance tradeoff, we build the foundation for the techniques of K-Fold and Leave-One-Out validation practiced in chapter three.

Exercise 1: Creating train, test, and validation datasets Exercise 2: Create one holdout set Exercise 3: Create two holdout sets Exercise 4: Why use holdout sets Exercise 5: Accuracy metrics: regression models Exercise 6: Mean absolute error Exercise 7: Mean squared error

Current Exercise

Exercise 8: Performance on data subsets Exercise 9: Classification metrics Exercise 10: Confusion matrices Exercise 11: Confusion matrices, again Exercise 12: Precision vs. recall Exercise 13: The bias-variance tradeoff Exercise 14: Error due to under/over-fitting Exercise 15: Am I underfitting?

Holdout sets are a great start to model validation. However, using a single train and test set if often not enough. Cross-validation is considered the gold standard when it comes to validating model performance and is almost always used when tuning model hyper-parameters. This chapter focuses on performing cross-validation to validate model performance.

Exercise 1: The problems with holdout sets Exercise 2: Two samples Exercise 3: Potential problems Exercise 4: Cross-validation Exercise 5: scikit-learn's KFold()Exercise 6: Using KFold indices Exercise 7: sklearn's cross_val_score()Exercise 8: scikit-learn's methods Exercise 9: Implement cross_val_score()Exercise 10: Leave-one-out-cross-validation (LOOCV)Exercise 11: When to use LOOCV Exercise 12: Leave-one-out-cross-validation

The first three chapters focused on model validation techniques. In chapter 4 we apply these techniques, specifically cross-validation, while learning about hyperparameter tuning. After all, model validation makes tuning possible and helps us select the overall best model.

Exercise 1: Introduction to hyperparameter tuning Exercise 2: Creating Hyperparameters Exercise 3: Running a model using ranges Exercise 4: RandomizedSearchCV Exercise 5: Preparing for RandomizedSearch Exercise 6: Implementing RandomizedSearchCV Exercise 7: Selecting your final model Exercise 8: Best classification accuracy Exercise 9: Selecting the best precision model Exercise 10: Course completed!