Get startedGet started for free

Feature importances

1. Feature importances

Great work so far! Your random forest model was already performing really well, and you then supercharged it by tuning its hyperparameters. Now, it's time to dig in further and look at feature importances.

2. Feature importances

Tree-based methods such as random forests allow us to calculate feature importances, which are scores representing how much each feature contributes to a prediction. Visualizing these feature importances is an effective way to communicate results to stakeholders. It can inform which features are important in driving churn and which features can be removed from the model.

3. Interpretability vs accuracy

Recall from the previous chapter that different models have different strengths and weaknesses. For example, deep learning techniques tend to perform really well in many different domains, but the neural networks that underlie these techniques are very hard to interpret. In contrast, simpler techniques like logistic regression might not perform as well, but are easier to interpret, and this interpretability might be the difference between whether or not you get buy-in from stakeholders, so do not underestimate its importance. The value of a churn model is not only in helping you identify which customers are at risk of churning, but also what the drivers of churn are, so let's see how to compute feature importances for our random forest model.

4. Random forest feature importances

Here's some code that instantiates a random forest classifier and fits it to our training data. After fitting the model, you can calculate the feature importances by using the feature importances attribute of the random forest object, as shown here. But just looking at the numbers isn't very informative. In the interactive exercises, you'll learn how to visualize these importances using a plot. Which features do you think will be the most important in predicting churn?

5. Let's practice!

It's time to find out!