1. Tuning xgboost hyperparameters in a pipeline
We are going to finish off this chapter, and the course, by seeing how automated hyperparameter tuning for an XGBoost model works within a scikit-learn pipeline. Once you have this down, you'll be able to make some of the most powerful well-tuned machine learning models today in an automated, reproducible manner.
2. Tuning XGBoost hyperparameters in a pipeline
We will again use the Boston housing dataset to motivate our use of pipelines and hyperparameter tuning.
As always, we first import what we will be using. The only difference is now we also import RandomizedSearchCV from the scikit-learn modelselection submodule.
We then load in our data and create our feature matrix X and target vector y and also create our pipeline that includes both the standard scaling step and a base xgboostregressor object with all default parameters.
At this point, you need to create the grid of parameters over which you will search. In order for the hyperparameters to be passed to the appropriate step, you have to name the parameters in the dictionary with the name of the step being referenced followed by 2 underscore signs and then the name of the hyperparameter you want to iterate over. Since the xgboost step is called xgb_model, all of our hyperparameter keys will start with xgboost_model__. In the example, we will tune subsample, max_depth, and colsample_bytree, and give each parameter a range of possible values. We then pass the pipeline in as an estimator to RandomizedSearchCV and the parameter grid to param_distributions. Everything else is as you've seen before, with appropriate scoring and cross-validation parameters passed in as well. Once that's done all you need to do is fit the randomizedsearch object and pass in the X and y objects we created earlier.
3. Tuning XGBoost hyperparameters in a pipeline II
Finally, once you've fit the randomizedsearchcv object, you can inspect what the best score it found was, and convert it to an RMSE.
You can also inspect what the best model found was and print it to screen.
4. Let's finish this up!
Ok, last coding exercise of the course, let's finish this up!