Get startedGet started for free

Evaluating Model Performance

1. Evaluating Model Performance

How good is your model? How can you measure its performance and then improve it? That's what we'll go over in this video.

2. Accuracy

One way of evaluating your model's performance is by computing its accuracy. Let's say that out of 100 customers, your model made correct predictions about 90 customers, and incorrect predictions about 10 customers. In this case, it would be 90% accurate. But what data should you use to compute this accuracy? If we computed the accuracy on the training data, it would not be indicative of how the model would perform on new data, which is what we're really interested in! And we want to be sure our model is performing well before we actually deploy it, as millions of dollars may be at stake!

3. Training and Test Sets

To this end, it is a common practice to split your data into two sets: A training set and a test set. You fit your classifier to the training set, and then make predictions using the test set, which the model has never seen before. This is a better way to gauge how generalizable your model is.

4. Training and Test Sets using scikit-learn

Time to split the telco dataset using sklearn! First, you'll need to import the train_test_split function from sklearn dot model selection. The first argument of this function is the feature data, and the second is the target. The third argument specifies the proportion of the data you want to save for testing purposes. In this case, 80% of the data will be in the training set, and 20% in the test set, which is a good starting point. The random state argument allows you to specify a seed. Setting a seed ensures that your results are reproducible, as all splits will be same seed and will be identical. Train test split returns four arrays, which are unpacked into 4 variables, as you can see here. These represent, respectively, the training data, test data, training labels, and test labels. We then instantiate our classifier as before, fit it to the training data, and make our predictions on the test data.

5. Computing Accuracy

To compute the accuracy of this model, you can use sklearn's score method. As you can see here, the accuracy of this model is quite good!

6. Improving your model

This raises the question: How can you improve your model's performance? You'll learn how to do so in subsequent lessons. If your model fits the training data too closely, then it becomes sensitive to noise in the training data, and does not generalize well to new data. In such a situation, the model performs really well on the training data, but not so well on the test data. This is known as overfitting. On the other hand, if your model is too simple, it will not capture the trends in your training data, and not make good predictions on the training or the test data. This is known as underfitting. In order to build a good model, it's critical to find the right balance between overfitting and underfitting.

7. Let's practice!

That's enough from me. It's now your turn to practice splitting your data and computing accuracy on the test data!