Creating training and test sets
You've just trained LogisticRegression()
models on different columns.
You know that the data should be separated into training and test sets. test_train_split()
is used to create both at the same time. The training set is used to make predictions, while the test set is used for evaluation. Without evaluating the model, you have no way to tell how well it will perform on new loan data.
In addition to the intercept_
, which is an attribute of the model, LogisticRegression()
models also have the .coef_
attribute. This shows how important each training column is for predicting the probability of default.
The data set cr_loan_clean
is already loaded in the workspace.
This exercise is part of the course
Credit Risk Modeling in Python
Exercise instructions
- Create the data set
X
using interest rate, employment length, and income. Create they
set using loan status. - Use
train_test_split()
to create the training and test sets fromX
andy
. - Create and train a
LogisticRegression()
model and store it asclf_logistic
. - Print the coefficients of the model using
.coef_
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create the X and y data sets
X = ____[[____,____,____]]
y = ____[[____]]
# Use test_train_split to create the training and test sets
X_train, X_test, y_train, y_test = ____(____, ____, test_size=.4, random_state=123)
# Create and fit the logistic regression model
____ = ____(solver='lbfgs').____(____, np.ravel(____))
# Print the models coefficients
print(____.coef_)