Get startedGet started for free

Selecting relevant features

In this exercise, you'll identify the redundant columns in the volunteer dataset, and perform feature selection on the dataset to return a DataFrame of the relevant features.

For example, if you explore the volunteer dataset in the console, you'll see three features which are related to location: locality, region, and postalcode. They contain related information, so it would make sense to keep only one of the features.

Take some time to examine the features of volunteer in the console, and try to identify the redundant features.

This exercise is part of the course

Preprocessing for Machine Learning in Python

View Course

Exercise instructions

  • Create a list of redundant column names and store it in the to_drop variable:
    • Out of all the location-related features, keep only postalcode.
    • Features that have gone through the feature engineering process are redundant as well.
  • Drop the columns in the to_drop list from the dataset.
  • Print out the .head() of volunteer_subset to see the selected columns.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a list of redundant column names to drop
to_drop = ["____", "____", "____", "____", "____"]

# Drop those columns from the dataset
volunteer_subset = ____.____(____, ____)

# Print out the head of volunteer_subset
print(____)
Edit and Run Code