Random forest feature importances
One useful aspect of tree-based methods is the ability to extract feature importances. This is a quantitative way to measure how much each feature contributes to our predictions. It can help us focus on our best features, possibly enhancing or tuning them, and can also help us get rid of useless features that may be cluttering up our model.
Tree models in sklearn
have a .feature_importances_
property that's accessible after fitting the model. This stores the feature importance scores. We need to get the indices of the sorted feature importances using np.argsort()
in order to make a nice-looking bar plot of feature importances (sorted from greatest to least importance).
This exercise is part of the course
Machine Learning for Finance in Python
Exercise instructions
- Use the
feature_importances_
property of our random forest model (rfr
) to extract feature importances into theimportances
variable. - Use numpy's
argsort
to get indices of the feature importances from greatest to least, and save the sorted indices in thesorted_index
variable. - Set xtick labels to be feature names in the
labels
variable, using thesorted_index
list.feature_names
must be converted to a numpy array so we can index it with thesorted_index
list.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get feature importances from our random forest model
importances = rfr.____
# Get the index of importances from greatest importance to least
sorted_index = ____(importances)[::-1]
x = range(len(importances))
# Create tick labels
labels = np.array(____)[____]
plt.bar(x, importances[sorted_index], tick_label=labels)
# Rotate tick labels to vertical
plt.xticks(rotation=90)
plt.show()