In-sample model fit and thresholding

1. In-sample model fit and thresholding

In this section we will go over how to evaluate your model.

2. Pseudo $R^2$ statistics I

For logistic regression there are several measures of model fit. Similar to the R^2 in the linear regression, there are three so-called pseudo R^2 statistics; these include the McFadden, the Cox & Snell, and the Nagelkerke. For these statistics, values greater than 0.2 classify a model as reasonable, greater than 0.4 as good and greater than 0.5 as very good.

3. Pseudo $R^2$ statistics II

The logRegR2() function from the descr package gives us several goodness of fit measures all of which tell us that the explanatory power is poor. The algorithm seems to have trouble in explaining a big portion of the variance. Well, there is a phrase “garbage in, garbage out”. It means that a model can only be as good as the data you have.

4. Predict probabilities

Another goodness of fit measure is called accuracy. It puts the correct predictions in relation to the overall number of observations. We first predict the probabilities by using the `predict()` function with the argument `type = response` specified. The argument `na.action` equal to `na.exclude` excludes observations with missing values. We display the actual observed values and the new predictions using the `select` and the `tail` function.

5. Confusion matrix

Now, observations are classified according to a certain threshold and the predicted and observed outcomes are compared in a so called confusion matrix. I use the function `confusion.matrix()` from the `SDMTools` package and hand it the vector of actual observed classes and the vector with the predicted probabilities. Per default a threshold of 0.5 is used. A classification is correct if it predicts an observation to be 0 where its true value is 0. This is called a true negative. A classification is also correct if it predicts an observation to be 1 where its true value is 1. That is called a true positive. In all other cases the prediction is wrong and observations are misclassified. Here, 37 000 customers were correctly classified not to return and 30 were correctly classified to return.

6. Accuracy

Comparing our classifications to the actual observations we find that we have an accuracy of 80%, pretty high! Even though the accurracy measure looks great, be careful. With a threshold of 0.5, the vast majority of customers is predicted not to return. This leads to a high number of correct predictions, but the cases where a customer returns to the shop are mostly misclassified. Hence, another choice of threshold is neccessary.

7. Finding the optimal threshold

Look at the table of potential payoffs. If a customer returns because, based on our predic- tions he was sent a coupon, the payoff is assumed to be 5 euro on average. Wrongly classifying a customer as one that will churn, whereas she would have returned anyways leads to a loss of on average 15 euros. Predicting a return of a customer does not cause any direct costs. The payoff is dependent on the true positives and the false negatives. If I change the threshold, the payoff changes as well. The table shows that while the accuracy constantly decreases with a lower threshold, the contrary happens for the respective payoff, reaching a maximum for a threshold of 0.3. ##### $$payoff =5 \cdot true\ negative - 15 \cdot false\ negative$$

8. Overfitting

So far we've only seen in-sample goodness of fit measure (models evaluated on the same data they're fitted on). This bears the risk of overfitting, remember? Next, we will learn ways to avoid overfitting.

9. Let's try it out!

...but let's practice first!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.