1. Regularization and base learners in XGBoost
Loss functions in XGBoost don't just take into account how close a model's predictions are to the actual values,
2. Regularization in XGBoost
but also take into account how complex the model is. This idea of penalizing models as they become more complex is called regularization. So, loss functions in XGBoost are used to find models that are both accurate and as simple as they can possibly be. There are several parameters that can be tweaked in XGBoost to limit model complexity by altering the loss function.
Gamma is a parameter for tree base learners that controls whether a given node on a base learner will split based on the expected reduction in the loss that would occur after performing the split, so that higher values lead to fewer splits.
Alpha is another name for L1 regularization. However, this regularization term is a penalty on leaf weights rather than on feature weights, as is the case in linear or logistic regression. Higher alpha values lead to stronger L1 regularization, which causes many leaf weights in the base learners to go to 0.
Lambda is another name for l2 regularization. L2 regularization is a much smoother penalty that l1 and causes leaf weights to smoothly decrease, instead of enforcing strong sparsity constraints on the leaf weights as in l1.
If you're interested in learning more about regularization, check out DataCamp's Supervised Learning With Scikit Learn Course.
3. L1 regularization in XGBoost example
Let's look at an example of how you can tune one of these regularization parameters using XGBoost.
As always, in lines 1-4 we import the necessary libraries, load in the data we will be working with, and create our feature matrix and target vectors X and y.
In line 5 we convert our X matrix and y vector into a single optimized DMatrix object, and in line 6 we create our parameter dictionary that defines some required parameters for our learner. Specifically, we provide the loss function necessary for regression, and the maximum depth each tree base learner can have.
In line 7 we create a list of 3 different l1 or alpha values that we will try, and in line 8 we initialize an empty list that will store our final root mean square error for each of these l1 or alpha values.
Line 9 is actually a multi-line for loop where we iterate over each entry in our l1_params list and do the following.
First, we create a new key-value pair in our parameter dictionary that holds our current alpha value.
We then run our XGBoost cross validation by passing in our DMatrix object, updated parameter dictionary, number of folds we want to cross-validate, number of trees we want as num_boost_round, the metric we want to compute, which is rmse, and that we want to output the results as a pandas DataFrame.
In lines 10 and 11, we simply look at the final RMSE as a function of l1 regularization strength. At this point, we've talked about
4. Base learners in XGBoost
base learners and regularization quite a bit. Let's finish this off by comparing the two kinds of base learners that exist in XGBoost.
The linear base learner is simply a sum of linear terms, exactly as you would find in a linear or logistic regression model. When you combine many of these base models into an ensemble, you get a weighted sum of linear models, which is itself linear. Since you don't get any nonlinear combination of features in the final model, this approach is rarely used, as you can get identical performance from a regularized linear model.
The tree base learner uses decision trees as base models. When the decision trees are all combined into an ensemble, their combination becomes a nonlinear function of each individual tree, which itself is nonlinear. At this point, I want to briefly mention how you'll see
5. Creating DataFrames from multiple equal-length lists
Dataframes being created in the next couple exercises after you've computed your results. We will use both the zip and list function, one inside of the other, to convert multiple equal-length lists into a single object that we can convert into a pandas dataframe.
Zip is a function that allows you to take multiple equal-length lists and iterate over them in parallel, side by side, as shown above. However, in python 3, zip creates a generator, or an object that doesn't have to be completely instantiated at runtime. In order for the entire zipped pair of lists to be instantiated, we have to cast the zip generator object into a list directly. After casting, we can convert this object directly into a dataframe.
The point of all of this is, don't feel overwhelmed when you see this pattern in the following exercises, as its very useful and will only make you a stronger Python programmer and data scientist.
6. Let's practice!
Now, it's your turn to perform l2 regularization with an XGBoost model in the following exercises. In addition, you'll learn how to visualize feature importances in your model.