Get Started

Sorting important features

Among other things, Decision Trees are very popular because of their interpretability. Many models can provide accurate predictions, but Decision Trees can also quantify the effect of the different features on the target. Here, it can tell you which features have the strongest and weakest impacts on the decision to leave the company. In sklearn, you can get this information by using the feature_importances_ attribute.

In this exercise, you're going to get the quantified importance of each feature, save them in a pandas DataFrame (a Pythonic table), and sort them from the most important to the less important. The model_ best Decision Tree Classifier used in the previous exercises is available in your workspace, as well as the features_test and features_train variables.

pandas has been imported as pd.

This is a part of the course

“HR Analytics: Predicting Employee Churn in Python”

View Course

Exercise instructions

  • Use the feature_importances_ attribute to calculate relative feature importances
  • Create a list of features
  • Save the results inside a DataFrame using the DataFrame() function, where the features are rows and their respective values are a column
  • Sort the relative_importances DataFrame to get the most important features on top using the sort_values() function and print the result

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Calculate feature importances
feature_importances = model_best.____

# Create a list of features: done
feature_list = list(features)

# Save the results inside a DataFrame using feature_list as an index
relative_importances = pd.____(index=____, data=feature_importances, columns=["importance"])

# Sort values to learn most important features
relative_importances.____(by="importance", ascending=False)
Edit and Run Code

This exercise is part of the course

HR Analytics: Predicting Employee Churn in Python

IntermediateSkill Level
3.7+
3 reviews

In this course you'll learn how to apply machine learning in the HR domain.

In this final chapter, you will learn how to use cross-validation to avoid overfitting the training data. You will also learn how to know which features are impactful, and which are negligible. Finally, you will use these newly acquired skills to build a better performing Decision Tree!

Exercise 1: Hyperparameter tuningExercise 2: Cross-validation using sklearnExercise 3: Setting up GridSearch parametersExercise 4: Implementing GridSearchExercise 5: Important features for predicting attritionExercise 6: Interpreting importanceExercise 7: Sorting important features
Exercise 8: Selecting important featuresExercise 9: Develop and test the best modelExercise 10: Final thoughts

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free