Objective (loss) functions and base learners

1. Objective (loss) functions and base learners

Let's talk a bit about objective functions and base learners so we can develop better intuitions about both concepts, as they are critical to understand in order for you to be able to grasp why XGBoost is such a powerful approach to building supervised regression models.

2. Objective Functions and Why We Use Them

An objective or loss function quantifies how far off our prediction is from the actual result for a given data point. It maps the difference between the prediction and the target to a real number. When we construct any machine learning model, we do so in the hopes that it minimizes the loss function across all of the data points we pass in. That's our ultimate goal, the smallest possible loss.

3. Common loss functions and XGBoost

Loss functions have specific naming conventions in XGBoost. For regression models, the most common loss function used is called reg squared error. For binary classification models, the most common loss functions used are reg logistic, when you simply want the category of the target, and binary logistic, when you want the actual predicted probability of the positive class. So, in chapter 1, we were implicitly using the reg logistic loss function when building our classification models in XGBoost.

4. Base learners and why we need them

As mentioned before, XGBoost is an ensemble learning method composed of many individual models that are added together to generate a single prediction. Each of the individual models that are trained and combined are called base learners. The goal of XGBoost is to have base learners that are slightly better than random guessing on certain subsets of training examples, and uniformly bad at the remainder, so that when all of the predictions are combined, the uniformly bad predictions cancel out and those slightly better than chance combine into a single very good prediction. Let's look at a couple examples using trees and linear base learners in XGBoost.

5. Trees as base learners example: Scikit-learn API

Here's an example of how to train an XGBoost regression model with trees as base learners using XGBoost's scikit-learn compatible API. We will use the Boston Housing dataset from UCI's machine learning repository as an example. In lines 1-5 we import the libraries we need and load in the data. In lines 6 and 7, we convert our data into our X matrix and y vector and split training and test sets as we've done before. In lines 8-10, we create our XGBoost regressor object, this time making sure we use the reg squared error objective function, fit it to our training data, and generate our predictions on the test set.

6. Trees as base learners example: Scikit-learn API

And finally in lines 11 and 12 we compute the RMSE and print the result to screen.

7. Linear base learners example: learning API only

To use linear base learners, we have to use the learning API in XGBoost. Here's an example. In lines 1-7 we do what we did as before, loading in appropriate libraries and data. In lines 8 and 9 we convert our training and testing sets into DMatrix objects, as is required by the learning API. In line 10 we create a parameter dictionary explicitly specifying the base learner we want as gblinear, and the reg squared error objective function we want to use. In lines 11-12 we train our model on the training set and generate predictions using the test set.

8. Linear base learners example: learning API only

In lines 13 and 14, we compute our rmse and print to screen, as we did before.

9. Let's get to work!

Ok, lets get to work!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.