Get startedGet started for free

Creating training and test sets

You've just trained LogisticRegression() models on different columns.

You know that the data should be separated into training and test sets. test_train_split() is used to create both at the same time. The training set is used to make predictions, while the test set is used for evaluation. Without evaluating the model, you have no way to tell how well it will perform on new loan data.

In addition to the intercept_, which is an attribute of the model, LogisticRegression() models also have the .coef_ attribute. This shows how important each training column is for predicting the probability of default.

The data set cr_loan_clean is already loaded in the workspace.

This exercise is part of the course

Credit Risk Modeling in Python

View Course

Exercise instructions

  • Create the data set X using interest rate, employment length, and income. Create the y set using loan status.
  • Use train_test_split() to create the training and test sets from X and y.
  • Create and train a LogisticRegression() model and store it as clf_logistic.
  • Print the coefficients of the model using .coef_.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create the X and y data sets
X = ____[[____,____,____]]
y = ____[[____]]

# Use test_train_split to create the training and test sets
X_train, X_test, y_train, y_test = ____(____, ____, test_size=.4, random_state=123)

# Create and fit the logistic regression model
____ = ____(solver='lbfgs').____(____, np.ravel(____))

# Print the models coefficients
print(____.coef_)
Edit and Run Code