Get startedGet started for free

Your first pipeline - again!

Back in the arrhythmia startup, your monthly review is coming up, and as part of that an expert Python programmer will be reviewing your code. You decide to tidy up by following best practices and replace your script for feature selection and random forest classification, with a pipeline. You are using a training dataset available as X_train and y_train, and a number of modules: RandomForestClassifier, SelectKBest() and f_classif() for feature selection, as well as GridSearchCV and Pipeline.

This exercise is part of the course

Designing Machine Learning Workflows in Python

View Course

Exercise instructions

  • Create a pipeline with the feature selector given by the sample code, and a random forest classifier. Name the first step feature_selection.
  • Add two key-value pairs in params, one for the number of features k in the selector with values 10 and 20, and one for n_estimators in the forest with possible values 2 and 5.
  • Initialize a GridSearchCV object with the given pipeline and parameter grid.
  • Fit the object to the data and print the best performing parameter combination.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create pipeline with feature selector and classifier
pipe = ___([
    (___, SelectKBest(f_classif)),
    ('clf', ___(random_state=2))])

# Create a parameter grid
params = {
   'feature_selection__k':___,
    ___:[2, 5]}

# Initialize the grid search object
grid_search = ___(___, ___=params)

# Fit it to the data and print the best value combination
print(grid_search.fit(___, ___).___)
Edit and Run Code