LoslegenKostenlos loslegen

Custom function transformers in pipelines

At some point, you were told that the sensors might be performing poorly for obese individuals. Previously you had dealt with that using weights, but now you are thinking that this information might also be useful for feature engineering, so you decide to replace the recorded weight of an individual with an indicator of whether they are obese. You want to do this using pipelines. You have numpy available as np, RandomForestClassifier(), FunctionTransformer(), and GridSearchCV().

Diese Übung ist Teil des Kurses

Designing Machine Learning Workflows in Python

Kurs anzeigen

Anleitung zur Übung

  • Define a custom feature extractor. This is a function that will output a modified copy of its input.
  • Replace each value of the first column with the indicator of whether that value is above a threshold given by a multiple of the column mean.
  • Convert the feature extractor above to a transformer and place it in a pipeline together with a random forest classifier.
  • Use grid search CV to try values 1, 2 and 3 for the multiplication constant multiplier in your feature extractor.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Define a feature extractor to flag very large values
def more_than_average(X, multiplier=1.0):
  Z = ____
  Z[:,1] = ____ > multiplier*np.mean(Z[:,1])
  return Z

# Convert your function so that it can be used in a pipeline
pipe = Pipeline([
  ('ft', ____(____)),
  ('clf', RandomForestClassifier(random_state=2))])

# Optimize the parameter multiplier using GridSearchCV
params = ____
grid_search = GridSearchCV(pipe, param_grid=params)
Code bearbeiten und ausführen