Assessing the model's performance

1. Assessing the model's performance

You learned how to build multi-class classification models in keras, now we will see how to calculate the model's performance.

2. Accuracy is not too informative

Imagine that you built a model and achieved eighty percent accuracy on the test data. Is this model good? Can you say if the model is classifying all the classes correctly? Or if the accuracy is the same for each class? Or if the model is predicting only the class with more observations and thus achieving high accuracy? We cannot say looking only on the accuracy!

3. Confusion matrix

In the confusion matrix above, we have the predictions on the columns and the true labels on the rows. The numbers represent the frequencies of prediction or true labels in each of the classes. The total number of observations is 100. We can see that the model has 80 percent accuracy by summing the diagonal of the matrix that corresponds to the corrected prediction of each class, sci dot space, alt dot atheism and soc dot religion dot Christian. But we can see that the model predicts almost always the class sci dot space! This model is over-fitted to this class and has poor performance on the others. How can we make a better metric?

4. Precision

One possible metric is the precision. It measures, for each class, if the model is accurately predicting this class.

5. Recall

Another is the recall, that measures if the classes are being correctly classified.

6. F1-Score

F1 score is an weighted harmonic average between precision and recall.

7. Sklearn confusion matrix

We can compute the confusion matrix using function called confusion underscore matrix from sklearn dot metrics. The function receives two parameters, y true and y pred containing the correct and the predicted values. We can see from the output that the model seems to have higher accuracy for the first class.

8. Performance metrics

Now, let's see how to compute the performance metrics with Python. We will use the implementation in sklearn to compute the metrics. The performance metrics can be found in the sklearn dot metrics module, and we can import the confusion matrix, precision score, recall score, f1 score, accuracy score and classification report.

9. Performance metrics

The accuracy score gives the overall performance of the model. To use the other functions we should pass the parameter average equal to None (will be explained later). We can check the values of the metrics to see that they change greatly from class to class. F1 score of 0.15 and 0.35 for the second and third classes is not a good performance.

10. Classification report

All those metrics can be computed in one function: the classification report. This function can receive the parameter target names with the classes names for better print. The output is a formatted string with precision, recall, f1 score and support for each class. Support is the number of observations of the class present on the data. At the end, it also shows values for micro average, macro average and weighted average. Micro average is an average between true positive, false positive and false negative rates and is one value for all classes. Macro average computes the average of precision, recall and f1 score between all labels and weighted average is the average weighted by support.

11. Let's practice!

Well, it is time to measure the performance of the models you created!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Recurrent Neural Networks (RNNs) for Language Modeling with Keras

AdvancedSkill Level

4.8+

76 reviews