Gradient boosting feature importances
As with random forests, we can extract feature importances from gradient boosting models to understand which features are the best predictors. Sometimes it's nice to try different tree-based models and look at the feature importances from all of them. This can help average out any peculiarities that may arise from one particular model.
The feature importances are stored as a numpy
array in the .feature_importances_
property of the gradient boosting model. We'll need to get the sorted indices of the feature importances, using np.argsort()
, in order to make a nice plot. We want the features from largest to smallest, so we will use Python's indexing to reverse the sorted importances like feat_importances[::-1]
.
This exercise is part of the course
Machine Learning for Finance in Python
Exercise instructions
- Reverse the
sorted_index
variable to go from greatest to least using python indexing. - Create the sorted feature labels list as
labels
by convertingfeature_names
to a numpy array and indexing withsorted_index
. - Create a bar plot of the xticks, and
feature_importances
indexed with thesorted_index
variable, andlabels
as the xtick labels.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Extract feature importances from the fitted gradient boosting model
feature_importances = gbr.feature_importances_
# Get the indices of the largest to smallest feature importances
sorted_index = np.argsort(feature_importances)[::____]
x = range(features.shape[1])
# Create tick labels
labels = np.array(feature_names)[____]
plt.bar(____, feature_importances[____], tick_label=____)
# Set the tick lables to be the feature names, according to the sorted feature_idx
plt.xticks(rotation=90)
plt.show()