Get startedGet started for free

Predicting probability of default

All of the data processing is complete and it's time to begin creating predictions for probability of default. You want to train a LogisticRegression() model on the data, and examine how it predicts the probability of default.

So that you can better grasp what the model produces with predict_proba, you should look at an example record alongside the predicted probability of default. How do the first five predictions look against the actual values of loan_status?

The data set cr_loan_prep along with X_train, X_test, y_train, and y_test have already been loaded in the workspace.

This exercise is part of the course

Credit Risk Modeling in Python

View Course

Exercise instructions

  • Train a logistic regression model on the training data and store it as clf_logistic.
  • Use predict_proba() on the test data to create the predictions and store them in preds.
  • Create two data frames, preds_df and true_df, to store the first five predictions and true loan_status values.
  • Print the true_df and preds_df as one set using .concat().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Train the logistic regression model on the training data
____ = ____(solver='lbfgs').____(____, np.ravel(____))

# Create predictions of probability for loan status using test data
____ = clf_logistic.____(____)

# Create dataframes of first five predictions, and first five true labels
____ = pd.DataFrame(____[:,1][0:5], columns = ['prob_default'])
____ = y_test.____()

# Concatenate and print the two data frames for comparison
print(pd.____([true_df.reset_index(drop = True), preds_df], axis = 1))
Edit and Run Code