Exercise

Random forest feature importances

One useful aspect of tree-based methods is the ability to extract feature importances. This is a quantitative way to measure how much each feature contributes to our predictions. It can help us focus on our best features, possibly enhancing or tuning them, and can also help us get rid of useless features that may be cluttering up our model.

Tree models in sklearn have a .feature_importances_ property that's accessible after fitting the model. This stores the feature importance scores. We need to get the indices of the sorted feature importances using np.argsort() in order to make a nice-looking bar plot of feature importances (sorted from greatest to least importance).

Instructions

100 XP
  • Use the feature_importances_ property of our random forest model (rfr) to extract feature importances into the importances variable.
  • Use numpy's argsort to get indices of the feature importances from greatest to least, and save the sorted indices in the sorted_index variable.
  • Set xtick labels to be feature names in the labels variable, using the sorted_index list. feature_names must be converted to a numpy array so we can index it with the sorted_index list.