Assessing out-of-sample model fit

You now know that it makes more sense to look at the out-of-sample model fit than the in-sample fit. In this exercise, you therefore want to come up with an out-of-sample accuracy measure.

Before, you will have to do some preparational steps, though. Take defaultData again. logitModelNew is already loaded in your environment.

Be aware that for a complete analysis you would always have to compare different model candidates also (and especially) using out-of-sample data.

The in-sample accuracy - using the optimal threshold of 0.3 - is 0.7922901. Make sure you understand if there is overfitting.

First, split the dataset randomly into training and test set. The training set shall contain 2/3 of the overall data.
Then, quickly run the model and call it logitTrainNew. Use the given formula.
Make predictions on the test set and then calculate the out-of-sample accuracy with the help of a confusion matrix. Note that SDMTools cannot be downloaded from CRAN anymore. For your personal computer install it instead via remotes::install_version("SDMTools", "1.1-221.2").
Compare the out-of-sample accuracy to the in-sample value, given above.

Modeling Customer Lifetime Value with Linear Regression

Logistic Regression for Churn Prevention

Modeling Time to Reorder with Survival Analysis

Reducing Dimensionality with Principal Component Analysis

Ubung

Assessing out-of-sample model fit

Anweisungen