1. Evaluating models
Let's dive into more detail for the evaluation phase of model development. We'll start with regression evaluation techniques and then cover classification.
2. Regression techniques
Let's go over the three most common regression evaluation techniques to prepare you for your next interview: first, the coefficient of determination, or r-squared, the mean absolute error, MAE, and finally, the mean squared error, MSE.
3. R-squared
We previously touched on R-squared when we discussed analyzing relationships between two or more variables.
R-squared tells us the proportion of variance of the dependent variable that is explained by the regression model. Here, the residuals are plotted and show us how good of a fit our model is. This is often the first metric data scientists go to when evaluating their model. In python, we can use the score function to get this.
4. MAE vs. MSE
Next up is mean absolute error, and mean squared error. MAE is the sum of the absolute residuals over the number of points, and MSE is the sum of the residuals squared over the number of points as well.
The resulting penalty functions look like this, with absolute error scaling linearly and squared error scaling more exponentially. As a result, different scenarios call for different metrics. In the exercises, you can leverage the mean-underscore-absolute-underscore-error function in python.
5. MAE vs. MSE
Here's a pretty typical question that interviewers might ask you, concerning which metric you should minimize for.
Typically, if your dataset has outliers or if you're worried about individual observations, you'll want to use MSE, since by squaring the errors, they are weighted more heavily.
On the other hand, if you aren't as concerned with outliers or singular observations, MAE can be used to suppress those errors a bit more, since this involves taking the absolute value instead of squaring the errors.
6. Classification techniques
Next up is classification, and we'll talk more about precision, recall, and confusion matrices.
7. Precision
Precision is the number of true positives over the number of true positives plus false positives. It can be interpreted as the percentage of observations that you correctly guessed and is linked to the rate of the type I error.
8. Recall
Recall is the number of true positives over the number of true positives plus false negatives and is linked to the rate of type II errors.
9. Confusion matrix
Interviewers may ask you to choose a metric based on the context of the problem. Using the confusion matrix that we discussed earlier in the course, we easily see where our model weaknesses are, whether it's false positives, known as type I errors, or false negatives, known as type II errors. This also ties in nicely with precision and recall.
10. Confusion matrix
For example, if you're building a spam detector, you probably don't want to make any type I errors and want to optimize for precision.
11. Confusion matrix
On the other hand, if you're trying to classify a rare disease, you normally want to avoid type II errors, so recall is your priority. The precision-underscore-score and recall-underscore-score functions will help you get these metrics in python.
12. Summary
Let's summarize the lesson. We went over regression evaluation methods, including R-squared, mean absolute error, and mean squared error. For classification, we talked about, precision, recall, and confusion matrices.
13. Let's prepare for the interview!
So, let's go to the exercises and get some practice in!