1. Learn
  2. /
  3. Courses
  4. /
  5. Designing Machine Learning Workflows in Python

Exercise

Your first pipeline - again!

Back in the arrhythmia startup, your monthly review is coming up, and as part of that an expert Python programmer will be reviewing your code. You decide to tidy up by following best practices and replace your script for feature selection and random forest classification, with a pipeline. You are using a training dataset available as X_train and y_train, and a number of modules: RandomForestClassifier, SelectKBest() and f_classif() for feature selection, as well as GridSearchCV and Pipeline.

Instructions

100 XP
  • Create a pipeline with the feature selector given by the sample code, and a random forest classifier. Name the first step feature_selection.
  • Add two key-value pairs in params, one for the number of features k in the selector with values 10 and 20, and one for n_estimators in the forest with possible values 2 and 5.
  • Initialize a GridSearchCV object with the given pipeline and parameter grid.
  • Fit the object to the data and print the best performing parameter combination.