Evaluate the Decision Tree

You can assess the quality of your model by evaluating how well it performs on the testing data. Because the model was not trained on these data, this represents an objective assessment of the model.

A confusion matrix gives a useful breakdown of predictions versus known values. It has four cells which represent the counts of:

True Negatives (TN) — model predicts negative outcome & known outcome is negative
True Positives (TP) — model predicts positive outcome & known outcome is positive
False Negatives (FN) — model predicts negative outcome but known outcome is positive
False Positives (FP) — model predicts positive outcome but known outcome is negative.

These counts (TN, TP, FN and FP) should sum to the number of records in the testing data, which is only a subset of the flights data. You can compare to the number of records in the tests data, which is flights_test.count().

Note: These predictions are made on the testing data, so the counts are smaller than they would have been for predictions on the training data.

Create a confusion matrix by counting the combinations of label and prediction. Display the result.
Count the number of True Negatives, True Positives, False Negatives and False Positives.
Calculate the accuracy.

Introduction

Classification

Regression

Ensembles & Pipelines

Exercise

Evaluate the Decision Tree

Instructions