Session Ready
Exercise

Assessing out-of-sample model fit

You now know that it makes more sense to look at the out-of-sample model fit than the in-sample fit. In this exercise, you therefore want to come up with an out-of-sample accuracy measure.

Before, you will have to do some preparational steps, though. Take defaultData again. logitModelNew is already loaded in your environment.

Be aware that for a complete analysis you would always have to compare different model candidates also (and especially) using out-of-sample data.

The in-sample accuracy - using the optimal threshold of 0.3 - is 0.7922901. Make sure you understand if there is overfitting.

Instructions
100 XP
  • First, split the dataset randomly into training and test set. The training set shall contain 2/3 of the overall data.

  • Then, quickly run the model and call it logitTrainNew. Use the given formula.

  • Make predictions on the test set and then calculate the out-of-sample accuracy with the help of a confusion matrix. Note that SDMTools cannot be downloaded from CRAN anymore. For your personal computer install it instead via remotes::install_version("SDMTools", "1.1-221").

  • Compare the out-of-sample accuracy to the in-sample value, given above.