Selecting relevant features

In this exercise, you'll identify the redundant columns in the volunteer dataset, and perform feature selection on the dataset to return a DataFrame of the relevant features.

For example, if you explore the volunteer dataset in the console, you'll see three features which are related to location: locality, region, and postalcode. They contain related information, so it would make sense to keep only one of the features.

Take some time to examine the features of volunteer in the console, and try to identify the redundant features.

Deze oefening maakt deel uit van de cursus

Preprocessing for Machine Learning in Python

Cursus bekijken

Oefeninstructies

Create a list of redundant column names and store it in the to_drop variable:
- Out of all the location-related features, keep only postalcode.
- Features that have gone through the feature engineering process are redundant as well.
Drop the columns in the to_drop list from the dataset.
Print out the .head() of volunteer_subset to see the selected columns.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create a list of redundant column names to drop
to_drop = ["____", "____", "____", "____", "____"]

# Drop those columns from the dataset
volunteer_subset = ____.____(____, ____)

# Print out the head of volunteer_subset
print(____)

Code bewerken en uitvoeren