Get startedGet started for free

Kidney disease case study III: Full pipeline

It's time to piece together all of the transforms along with an XGBClassifier to build the full pipeline!

Besides the numeric_categorical_union that you created in the previous exercise, there are two other transforms needed: the Dictifier() transform which we created for you, and the DictVectorizer().

After creating the pipeline, your task is to cross-validate it to see how well it performs.

This exercise is part of the course

Extreme Gradient Boosting with XGBoost

View Course

Exercise instructions

  • Create the pipeline using the numeric_categorical_union, Dictifier(), and DictVectorizer(sort=False) transforms, and xgb.XGBClassifier() estimator with max_depth=3. Name the transforms "featureunion", "dictifier" "vectorizer", and the estimator "clf".
  • Perform 3-fold cross-validation on the pipeline using cross_val_score(). Pass it the pipeline, pipeline, the features, kidney_data, the outcomes, y. Also set scoring to "roc_auc" and cv to 3.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create full pipeline
pipeline = ____([
                     ("____", ____),
                     ("____", ____),
                     ("____", ____),
                     ("____", ____)
                    ])

# Perform cross-validation
cross_val_scores = ____(____, ____, ____, ____="____", ____=____)

# Print avg. AUC
print("3-fold AUC: ", np.mean(cross_val_scores))
Edit and Run Code