Selecting important features
In this exercise, your task is to select only the most important features that will be used by the final model. Remember, that the relative importances are saved in the column importance
of the DataFrame called relative_importances
.
This exercise is part of the course
HR Analytics: Predicting Employee Churn in Python
Exercise instructions
- Select only the features with an
importance
value higher than 1%. - Create a list from those features and print them (this has been done for you).
- Using the index saved in
selected_list
, transform bothfeatures_train
andfeatures_test
to include the features with an importance higher than 1% only.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# select only features with relative importance higher than 1%
selected_features = relative_importances[relative_importances.____>0.01]
# create a list from those features: done
selected_list = selected_features.index
# transform both features_train and features_test components to include only selected features
features_train_selected = features_train[selected_list]
features_test_selected = ____[____]