1. Model evaluation
Well done on reviewing logistic regression!
In this final lesson of the course, we will go over model evaluation.
2. Introduction
We've fitted linear and logistic models, but we haven't yet checked how good they are at predictions.
Companies need to know whether the models work and, consequently, if they can trust their predictions.
3. Validation set approach
The slide presents a dataset where y is the response variable.
To validate your model, you can randomly divide the dataset into two parts:
4. Validation set approach
A training set and a test set.
5. Validation set approach
You fit the model on the training set.
6. Validation set approach
Then, you derive the predictions using the explanatory variables of the test set.
7. Validation set approach
The final step is to compare the predicted values with the actual values of the test set and compute the evaluation metric. You can iteratively change your model to improve the evaluation metric.
8. Cross-validation
Another approach is called k-fold cross-validation. K is a number here. Let's use 5-fold cross-validation as an example.
9. Cross-validation
In 5-fold cross-validation, we divide the whole dataset into five random subsamples of approximately the same size.
10. Cross-validation
We then use one of the subsamples as a test set and the remaining four as a training set.
11. Cross-validation
We repeat that five times
12. Cross-validation
so that each subsample
13. Cross-validation
is used as a test set
14. Cross-validation
exactly once. We can calculate the average of metrics to have one number for comparison against other models.
15. Confusion matrix
Now, let's move to evaluation metrics. We will go over regression and classification metrics. Take a look at the confusion matrix. There are four possible classification results.
16. Confusion matrix
The diagonal elements represent the predictions for which the predicted label is equal to the true label. The two remaining elements represent incorrect predictions.
17. Confusion matrix
Either the actual result was true, and we predicted false,
18. Confusion matrix
or the actual result was false, and we predicted true.
19. Classification metrics
Accuracy is the number of correct predictions divided by the total number of predictions. Precision is the number of true positives divided by the sum of true and false positives. Recall, on the other hand, is the number of true positives divided by the sum of true positives and false negatives.
20. Classification metrics
Different metrics are used for various purposes.
Precision may be applied, for example, for a spam detector. We would rather classify an e-mail as non-spam than having the user miss something important.
Recall may be useful to classify a rare disease, since it's better to raise the alarm if the symptoms even resemble a disease rather than ignore it.
21. Regression metrics
To evaluate classification, we count the number of correct and false predictions.
22. Regression metrics
In regression, we measure the distance between the actual and the predicted values.
23. Regression metrics
Root Mean Squared Error and Mean Absolute Error are commonly used metrics.
Root Mean Squared Error measures the average magnitude of the error by taking the square root of the average of squared differences between the prediction and actual observation.
Mean Absolute Error is the sum of the absolute differences between the predictions and actual values.
24. Regression metrics
The difference between these two metrics is that Root Mean Squared Error gives a relatively high weight to large errors.
You should use this metric if large errors are particularly undesirable.
For example, if it's worse for you to be wrong by ten than to be wrong by five twice, then choose Root Mean Squared Error over Mean Absolute Error as your metric.
Mean Absolute Error may be preferable because its interpretation is straightforward.
25. Summary
To summarize, we've covered validation set approach, cross-validation, confusion matrix, classifications metrics, and regression metrics.
26. Let's practice!
Let's practice model evaluation!