1. Measuring cross-validation performance
Now that you've generated your cross-validated data frames and models, let's learn how to use the validation data to measure the performance of each model.
2. Measuring Performance
In order to measure validate performance of your models you need to compare the actual values of life expectancy in the validate data frames to the ones generated using the prediction model.
To do this you need to first prepare both sets of values.
3. Measuring Performance - Truth
First you need to isolate the actual values.
4. Measuring Performance - Truth
5. Measuring Performance - Truth
I will refer to this vector of values as actual values.
6. Measuring Performance - Prediction
Next, you need to use the features of these observations
7. Measuring Performance - Prediction
along with the model
8. Measuring Performance - Prediction
to generate a series of predictions for the validation data.
9. Measuring Performance
Now that you have both the predicted and actual values of life expectancy you can compare them directly. By measuring the differences between them you can assess the overall performance using your preferred metric.
10. Mean Absolute Error
The metric I prefer is called the Mean Absolute Error or MAE.
This metric captures the average magnitude by which the predictions differ from the actual values.
The most appealing trait of this metric is that it has an intuitive interpretation. Using the MAE, you have an idea of how much, on average, your model's prediction will differ from reality.
11. Ingredients for Performance Measurement
To summarize, you need three ingredients to measure performance:
The actual life expectancy values, the predicted life expectancy values and a metric to compare the two.
Now let's learn how to do this in R.
12. 1) Extract the actual values
To extract the actual values of life expectancy from the validate data frames you can use the map() function. Here the dot x refers to each validate data frame so you can use the dollar operator to access the life expectancy column vector.
13. The predict() & map2() functions
In order to generate the predicted values you need to use the predict() function. This function requires two inputs, the model and the data to predict on.
You now need to expand your collection of map tools to include the map2() function. This is very similar to the map() function you've learned in chapter 1 except that you can use two input columns.
The syntax is very similar except you now use dot x and dot y as your first two parameters and you refer to these placeholders in the formula in the same way.
14. 2) Prepare the predicted values
As before you can use this map2() function inside mutate() to append a column of predictions for each cross validation fold.
15. 3) Calculate MAE
Now that you have the actual and predicted values for each cross validation fold you can compare them by using the mae() function from the Metrics package.
Again, you can use a map2() variant. Since you know that the result will be a double vector you can directly use the map2_double() function to ensure that the value is returned as a vector instead of a list.
And this is how you can measure the performance for each cross validation fold.
16. Let's practice!
Now it's your turn to calculate the performance of your cross-validated linear regression models.