Specifying a cut-off
We have shown you how the specification of a cut-off can make the difference to obtain a good confusion matrix. Now, you will learn how to transform the prediction vector to a vector of binary values indicating the status of the loan. The ifelse()
function in R can help you here.
Applying the ifelse()
function in the context of a cut-off, you would have something like
ifelse(predictions > 0.3, 1, 0)
In the first argument, you are testing whether a certain value in the predictions-vector is bigger than 0.3. If this is TRUE
, R returns "1" (specified in the second argument), if FALSE
, R returns "0" (specified in the third argument), representing "default" and "no default", respectively.
This is a part of the course
“Credit Risk Modeling in R”
Exercise instructions
- The code for the full logistic regression model along with the predictions-vector is given in your console.
- Using a cutoff of 0.15, create vector
pred_cutoff_15
using the theifelse()
function andpredictions_all_full
. - Look at the confusion matrix using
table()
(enter the true values, sotest_set$loan_status
, in the first argument).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# The code for the logistic regression model and the predictions is given below
log_model_full <- glm(loan_status ~ ., family = "binomial", data = training_set)
predictions_all_full <- predict(log_model_full, newdata = test_set, type = "response")
# Make a binary predictions-vector using a cut-off of 15%
# Construct a confusion matrix
This exercise is part of the course
Credit Risk Modeling in R
Apply statistical modeling in a real-life setting using logistic regression and decision trees to model credit risk.
Logistic regression is still a widely used method in credit risk modeling. In this chapter, you will learn how to apply logistic regression models on credit data in R.
Exercise 1: Logistic regression: introductionExercise 2: Basic logistic regressionExercise 3: Interpreting the odds for a categorical variableExercise 4: Multiple variables in a logistic regression modelExercise 5: Interpreting significance levelsExercise 6: Logistic regression: predicting the probability of defaultExercise 7: Predicting the probability of defaultExercise 8: Making more discriminative modelsExercise 9: Evaluating the logistic regression model resultExercise 10: Specifying a cut-offExercise 11: Comparing two cut-offsExercise 12: Wrap-up and remarksExercise 13: Comparing link functions for a given cut-offWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.