Sorting important features
Among other things, Decision Trees are very popular because of their interpretability. Many models can provide accurate predictions, but Decision Trees can also quantify the effect of the different features on the target. Here, it can tell you which features have the strongest and weakest impacts on the decision to leave the company. In sklearn
, you can get this information by using the feature_importances_
attribute.
In this exercise, you're going to get the quantified importance of each feature, save them in a pandas DataFrame (a Pythonic table), and sort them from the most important to the less important. The model_ best
Decision Tree Classifier used in the previous exercises is available in your workspace, as well as the features_test
and features_train
variables.
pandas
has been imported as pd
.
This is a part of the course
“HR Analytics: Predicting Employee Churn in Python”
Exercise instructions
- Use the
feature_importances_
attribute to calculate relative feature importances - Create a list of features
- Save the results inside a DataFrame using the
DataFrame()
function, where the features are rows and their respective values are a column - Sort the
relative_importances
DataFrame to get the most important features on top using thesort_values()
function and print the result
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Calculate feature importances
feature_importances = model_best.____
# Create a list of features: done
feature_list = list(features)
# Save the results inside a DataFrame using feature_list as an index
relative_importances = pd.____(index=____, data=feature_importances, columns=["importance"])
# Sort values to learn most important features
relative_importances.____(by="importance", ascending=False)
This exercise is part of the course
HR Analytics: Predicting Employee Churn in Python
In this course you'll learn how to apply machine learning in the HR domain.
In this final chapter, you will learn how to use cross-validation to avoid overfitting the training data. You will also learn how to know which features are impactful, and which are negligible. Finally, you will use these newly acquired skills to build a better performing Decision Tree!
Exercise 1: Hyperparameter tuningExercise 2: Cross-validation using sklearnExercise 3: Setting up GridSearch parametersExercise 4: Implementing GridSearchExercise 5: Important features for predicting attritionExercise 6: Interpreting importanceExercise 7: Sorting important featuresExercise 8: Selecting important featuresExercise 9: Develop and test the best modelExercise 10: Final thoughtsWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.