Selecting relevant features
In this exercise, you'll identify the redundant columns in the volunteer
dataset, and perform feature selection on the dataset to return a DataFrame of the relevant features.
For example, if you explore the volunteer
dataset in the console, you'll see three features which are related to location: locality
, region
, and postalcode
. They contain related information, so it would make sense to keep only one of the features.
Take some time to examine the features of volunteer
in the console, and try to identify the redundant features.
Diese Übung ist Teil des Kurses
Preprocessing for Machine Learning in Python
Anleitung zur Übung
- Create a list of redundant column names and store it in the
to_drop
variable:- Out of all the location-related features, keep only
postalcode
. - Features that have gone through the feature engineering process are redundant as well.
- Out of all the location-related features, keep only
- Drop the columns in the
to_drop
list from the dataset. - Print out the
.head()
ofvolunteer_subset
to see the selected columns.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Create a list of redundant column names to drop
to_drop = ["____", "____", "____", "____", "____"]
# Drop those columns from the dataset
volunteer_subset = ____.____(____, ____)
# Print out the head of volunteer_subset
print(____)