Gradient boosting flavors

1. Gradient boosting flavors

Welcome to the final lesson of Chapter 3! In this lesson, you'll learn about some variations, or flavors, of the gradient boosting family of algorithms, along with their implementations in Python.

2. Variations of gradient boosting

We'll start with Extreme Gradient boosting, which is implemented with the XGBoost framework. Then, we'll see the Light Gradient Boosting Machine algorithm, and how to use it with LightGBM. Finally, you'll learn about the newest flavor: Categorical Boosting, or CatBoost.

3. Extreme gradient boosting (XGBoost)

XGBoost, is a more advanced implementation of the Gradient Boosting algorithm, optimized for distributed computing for both training and prediction phases. While gradient boosting is a sequential ensemble, XGBoost uses parallel processing for training each estimator, thus speeding up the processing. It's described as a scalable, portable, and accurate solution that can work with huge datasets. To build a XGBoost model, we first import the library with the alias xgb. This allows us to call the classes XGBClassifier or XGBRegressor. The parameters are similar to the ones for Gradient Boosting. However, learning_rate and max_depth have no default value, so we must provide them. The API allow us to train the model and predict with it like with any scikit-learn estimator. DataCamp has an entire course dedicated to XGBoost, which you should check out.

4. Light gradient boosting machine

Let's move on to Light Gradient Boosting, or LightGBM, which is a framework developed by Microsoft. Compared to XGBoost, LightGBM provides faster training and higher efficiency. It is also lighter in terms of space and memory usage. Being a distributed algorithm means it's optimized for parallel and GPU processing. LightGBM is useful when you are dealing with big datasets but have speed or memory constraints. In order to train a LightBoost ensemble model, you must import the lightgbm library and alias it as lgb, which stands for Light Gradient Boosting. Then, you can use the LGBMClassifier or LGBMRegressor depending on your problem. The parameters are similar to the ones for Gradient Boosting, except for max depth which is negative one by default, meaning no limit. Therefore, we must specify its value if a limit is desired. After training the model, you can use the fit and predict methods like with any scikit-learn estimator.

5. Categorical boosting

Categorical Boosting (or CatBoost) is the most recent Gradient Boosting flavor. It was open sourced by Yandex, a Russian tech company, in April 2017. CatBoost has built-in capacity to handle categorical features, so you don't need to do the preprocessing yourself. It is a fast implementation which can scale to large datasets and run on a GPU if required. CatBoost also provides a user friendly interface that integrates well with scikit-learn. Similar to the other variations, to build a CatBoost estimator, we import catboost and give it the alias cb. This gives us access to CatBoostClassifier and CatBoostRegressor. Here we also have a similar set of parameters, but as you can notice the default values are all None by default. Therefore, we must specify the parameters while instantiating the estimator. CatBoost also provides the fit and predict methods you are familiar with.

6. It's your turn!

To round out this chapter, let's get some practice with these different gradient boosting frameworks.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.