LoslegenKostenlos loslegen

Preprocessing within a pipeline

Now that you've seen what steps need to be taken individually to properly process the Ames housing data, let's use the much cleaner and more succinct DictVectorizer approach and put it alongside an XGBoostRegressor inside of a scikit-learn pipeline.

Diese Übung ist Teil des Kurses

Extreme Gradient Boosting with XGBoost

Kurs anzeigen

Anleitung zur Übung

  • Import DictVectorizer from sklearn.feature_extraction and Pipeline from sklearn.pipeline.
  • Fill in any missing values in the LotFrontage column of X with 0.
  • Complete the steps of the pipeline with DictVectorizer(sparse=False) for "ohe_onestep" and xgb.XGBRegressor() for "xgb_model".
  • Create the pipeline using Pipeline() and steps.
  • Fit the Pipeline. Don't forget to convert X into a format that DictVectorizer understands by calling the to_dict("records") method on X.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Import necessary modules
____
____

# Fill LotFrontage missing values with 0
X.LotFrontage = ____

# Setup the pipeline steps: steps
steps = [("ohe_onestep", ____),
         ("xgb_model", ____)]

# Create the pipeline: xgb_pipeline
xgb_pipeline = ____

# Fit the pipeline
____
Code bearbeiten und ausführen