Get startedGet started for free

Classification metrics

1. Classification metrics

Welcome back! We already understand classification models; now let's look at their accuracy metrics.

2. Classification metrics

Classification accuracy metrics are quite a bit different than regression ones. Remember, with classification models; we are predicting what category an observation falls into. There are a lot of accuracy metrics available. There is precision, recall, accuracy, specificity, F1-Score, alternate forms of the F1-score, and several others.

3. Classification metrics

We will focus on precision, recall, and accuracy. As each of these are easy to understand and have very practical applications. One way to calculate these metrics is to use the values from the confusion matrix.

4. Confusion matrix

When making predictions, especially if there is a binary outcome, this matrix is one of the first outputs you should review. When we have a binary outcome, the confusion matrix is a 2x2 matrix that shows how your predictions faired across the two outcomes. For example, for predictions of 0 that were actually 0 (or true negatives), we look at the 0, 0 square of the matrix. All of the accuracy metrics from the previous slide can be calculated using the values from this matrix, and it is a great way to visualize the initial results of your classification model.

5. Create confusion matrix with scikit-learn

We can create a confusion matrix using scikit-learn's function confusion_matrix(). When dealing with binary data, this will print out a 2x2 array which represents the confusion matrix. In this matrix, the row index represents the true category, and the column index represents the predicted category. Therefore, the 1, 0 entry of the array represents the number of true 1s that were predicted to be 0, or 8 in this example.

6. Accuracy

Accuracy is the easiest metric to understand and represents the overall ability of your model to correctly predict the correct classification. Using the confusion matrix, we add the values that were predicted 0 and actually are 0 (which are called true negatives), to the values predicted to be 1 that are 1 (called true positives), and then divide by the total number of observations. In this case, our accuracy was 85%. In this example, you can associate a true positive as predicted 1's that are also actually 1's. However, if your categories were win or loss, you might associate a true positive as the number of predicted wins that were actually wins.

7. Precision

Next is precision or the number of true positives out of all predicted positive values. We correctly predicted 62 true values but also predicted 7 false positives. Therefore, the precision is 62 divided by 69. Precision is used when we don't want to overpredict positive values. If it cost $2,000 to fly-in potential new employee's, a company may only have on-campus interviews with individuals that they really believe are going to join their company. In the example data, almost 9 out of 10 predicted 1's would have joined the company.

8. Recall

The recall metric is about finding all positive values. Here we correctly predicted 62 true positives and had 8 false negatives. Our recall is 62 out of 70. Recall is used when we can't afford to miss any positive values. For example, even if a patient has a small chance of having cancer, we may want to give them additional tests. The cost of missing a patient who has cancer is far greater than the cost of additional screenings for that patient.

9. Accuracy, precision, recall

Accuracy, precision, and recall are called similarly. Use the desired accuracy metric function and provide the true and predicted values. A single value will be produced as a result. In this example, we got the same values that we calculated using the confusion matrix.

10. Practice time

Let's work through a couple of examples using these accuracy metrics.