Column importance and default prediction
When using multiple training sets with many different groups of columns, it's important to keep and eye on which columns matter and which do not. It can be expensive or time-consuming to maintain a set of columns even though they might not have any impact on loan_status
.
The X
data for this exercise was created with the following code:
X = cr_loan_prep[['person_income','loan_int_rate',
'loan_percent_income','loan_amnt',
'person_home_ownership_MORTGAGE','loan_grade_F']]
Train an XGBClassifier()
model on this data, and check the column importance to see how each one performs to predict loan_status
.
The cr_loan_pret
data set along with X_train
and y_train
have been loaded in the workspace.
Cet exercice fait partie du cours
Credit Risk Modeling in Python
Instructions
- Create and train a
XGBClassifier()
model on theX_train
andy_train
training sets and store it asclf_gbt
. - Print the column importances for the columns in
clf_gbt
by using.get_booster()
and.get_score()
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Create and train the model on the training data
____ = xgb.____().____(____,np.ravel(____))
# Print the column importances from the model
print(clf_gbt.____().____(importance_type = 'weight'))