Get startedGet started for free

Leave-one-out-cross-validation (LOOCV)

1. Leave-one-out-cross-validation (LOOCV)

Welcome back - in this lesson we take KFold cross-validation another step forward and discuss leave-one-out-cross-validation.

2. LOOCV

The name says it all. In leave-one-out-cross-validation, we are going to implement KFold cross-validation, where k is equal to n, the number of observations in the data. This means that every single point will be used in a validation set, completely by itself. For the first model, we will use all of the data for training, except for the first point, which will be used for validation. In model 2, we leave only the second data point out, in model three, the third data point, and so on. We create n models, for n-observations in the data. It might seem odd to use a single point as a complete validation set, but recall what you will do after leave-one-out-cross-validation is complete. You will present the average error of the n model runs.

3. When to use LOOCV?

You can use this technique when your data is limited, and you want to use as much training data as possible when fitting the model. This method is also used because it provides the best error estimate possible for a single new point. Consider that you just ran n-models, where each time you left out a single point. If you are given a single new point and need to estimate your error, leave-one-out-cross-validation is the right method to use. Unfortunately, this method is very computationally expensive. You should be careful using it if you have a lot of data, or if you are planning on testing a lot of different parameter sets. The best way to judge if this method is even possible is to run KFold cross-validation with a large K, maybe 25 or 50, and gauge how long it would take you to actually run Leave-one-out-cross-validation with the n-observations in your data.

4. LOOCV Example

Implementing leave-one-out-cross-validation can be done using cross_val_score(). You only need to set the parameter cv equal to the number of observations in your dataset. We can find the number of observations by looking at the shape of the X dataset. The result of running leave-one-out-cross-validation will be a list of errors that stand for the error of running a model and leaving a single point out. The list will have n values, where n is the number of observations. Finally, we print the mean and use this as our overall error metric.

5. Let's practice

Let's start practicing leave-one-out-cross-validation.