Predicting probability of default
All of the data processing is complete and it's time to begin creating predictions for probability of default. You want to train a LogisticRegression()
model on the data, and examine how it predicts the probability of default.
So that you can better grasp what the model produces with predict_proba
, you should look at an example record alongside the predicted probability of default. How do the first five predictions look against the actual values of loan_status
?
The data set cr_loan_prep
along with X_train
, X_test
, y_train
, and y_test
have already been loaded in the workspace.
This exercise is part of the course
Credit Risk Modeling in Python
Exercise instructions
- Train a logistic regression model on the training data and store it as
clf_logistic
. - Use
predict_proba()
on the test data to create the predictions and store them inpreds
. - Create two data frames,
preds_df
andtrue_df
, to store the first five predictions and trueloan_status
values. - Print the
true_df
andpreds_df
as one set using.concat()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Train the logistic regression model on the training data
____ = ____(solver='lbfgs').____(____, np.ravel(____))
# Create predictions of probability for loan status using test data
____ = clf_logistic.____(____)
# Create dataframes of first five predictions, and first five true labels
____ = pd.DataFrame(____[:,1][0:5], columns = ['prob_default'])
____ = y_test.____()
# Concatenate and print the two data frames for comparison
print(pd.____([true_df.reset_index(drop = True), preds_df], axis = 1))