CommencerCommencer gratuitement

Your first pipeline - again!

Back in the arrhythmia startup, your monthly review is coming up, and as part of that an expert Python programmer will be reviewing your code. You decide to tidy up by following best practices and replace your script for feature selection and random forest classification, with a pipeline. You are using a training dataset available as X_train and y_train, and a number of modules: RandomForestClassifier, SelectKBest() and f_classif() for feature selection, as well as GridSearchCV and Pipeline.

Cet exercice fait partie du cours

Designing Machine Learning Workflows in Python

Afficher le cours

Instructions

  • Create a pipeline with the feature selector given by the sample code, and a random forest classifier. Name the first step feature_selection.
  • Add two key-value pairs in params, one for the number of features k in the selector with values 10 and 20, and one for n_estimators in the forest with possible values 2 and 5.
  • Initialize a GridSearchCV object with the given pipeline and parameter grid.
  • Fit the object to the data and print the best performing parameter combination.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create pipeline with feature selector and classifier
pipe = ___([
    (___, SelectKBest(f_classif)),
    ('clf', ___(random_state=2))])

# Create a parameter grid
params = {
   'feature_selection__k':___,
    ___:[2, 5]}

# Initialize the grid search object
grid_search = ___(___, ___=params)

# Fit it to the data and print the best value combination
print(grid_search.fit(___, ___).___)
Modifier et exécuter le code