Kidney disease case study III: Full pipeline
It's time to piece together all of the transforms along with an XGBClassifier
to build the full pipeline!
Besides the numeric_categorical_union
that you created in the previous exercise, there are two other transforms needed: the Dictifier()
transform which we created for you, and the DictVectorizer()
.
After creating the pipeline, your task is to cross-validate it to see how well it performs.
This exercise is part of the course
Extreme Gradient Boosting with XGBoost
Exercise instructions
- Create the pipeline using the
numeric_categorical_union
,Dictifier()
, andDictVectorizer(sort=False)
transforms, andxgb.XGBClassifier()
estimator withmax_depth=3
. Name the transforms"featureunion"
,"dictifier"
"vectorizer"
, and the estimator"clf"
. - Perform 3-fold cross-validation on the
pipeline
usingcross_val_score()
. Pass it the pipeline,pipeline
, the features,kidney_data
, the outcomes,y
. Also setscoring
to"roc_auc"
andcv
to3
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create full pipeline
pipeline = ____([
("____", ____),
("____", ____),
("____", ____),
("____", ____)
])
# Perform cross-validation
cross_val_scores = ____(____, ____, ____, ____="____", ____=____)
# Print avg. AUC
print("3-fold AUC: ", np.mean(cross_val_scores))